[ https://issues.apache.org/jira/browse/IMPALA-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Kaszab resolved IMPALA-13484. ----------------------------------- Fix Version/s: Impala 4.5.0 Assignee: Gabor Kaszab Resolution: Fixed > Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss > ---------------------------------------------------------------- > > Key: IMPALA-13484 > URL: https://issues.apache.org/jira/browse/IMPALA-13484 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: Gabor Kaszab > Assignee: Gabor Kaszab > Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.5.0 > > > *+Repro steps:+* > 1. Create a table with Hive that has a TS_LTZ column: > {code:java} > create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) > stored by iceberg; > {code} > 2. Insert some data using Hive: > {code:java} > insert into ice_hive_tbl values (1, current_timestamp()); > {code} > 3. Add a breakpoint in Impala to the table loading code right before Impala > sends out an alter_table to HMS to change the column type from TS_LTZ to TS. > [Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463], > for instance. > 4. Query the table from Impala. This triggers a table load. Impala will come > to a decision that it should change the TS_LTZ type of a column to TS. > However, the break point will hold it doing this at this point. > 5. Use Hive to add additional rows into the table: > {code:java} > insert into ice_hive_tbl values (2, current_timestamp()); > insert into ice_hive_tbl values (3, current_timestamp()); > {code} > 6. Release the breakpoint and let Impala finish the SELECT query started at 4) > 7. Do another SELECT * from Hive and/or Impala and verify that the extra rows > added at 5) are not present in the table. > *+Root cause:+* > When Impala changes the TS_LTZ column to TS it does so by calling > alter_table() on HMS directly. It gives a Metastore Table object to HMS as > the desired state of the table. HMS then persists this table object. > The problem with this: > - Impala doesn't use the Iceberg API to alter the table. As a result there > is no conflict detection performed, and it won't be detected that another > commits went into the table since Impala grabbed a table object from HMS. > - The metadata.json path is part of the table properties, and when Impala > calls alter_table(tbl_obj) HMS will also persist this metadata path to the > table, even though there were other changes that already moved the metadata > path forward. > - Essentially this will revert the changes on the table back to the state > when Impala loaded the table object from HMS. > - In a high-frequency scenario this could cause problems when Hive (or even > Spark) heavily writes the table and meanwhile Impala reads this table. Some > snapshots could be unintentionally reverted by this behavior and as a result > could cause data loss or any changes like deletes being reverted. > {+}Just a note{+}, FWIW, with the current approach Impala doesn't change the > column types in the Iceberg metadata, it does change the column types in HMS. > So even with this, the Iceberg metadata would show the column type as > timestamptz. > {+}Note2{+}, I described this problem using timestamp with local time zone as > an example but it could also be triggered by other column types not entirely > compatible with Impala. I haven't made my research to find out if there is > any other such type, though. > {+}Note3{+}, this issue seems to be there forever. I found the code that > triggers this being added by one of the first changes wrt Iceberg > integration, the "[Create Iceberg > table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]" > change. > *+Possible solutions:+* > 1. Impala can do the alter table by calling the Iceberg API and not HMS > directly. > There are thing to be careful about: > - With this approach would the above repro steps make the table loading fail > due to conflict between commits on the tables? Or could the schema change be > merged automatically be Iceberg lib to the latest state even if there had > been changes on the table? I think this would work as expected and won't > reject loading the table, but we should make sure when testing this. > - With this approach Impala would set the TS_LTZ cols to TS properly causing > no snapshots to be lost. However, when a new write is performed by > Hive/Spark, they'd set the col types back to TS_LTZ. And then when Impala > reads the table again, it will set these cols to TS again. And so on. > Question is, would a scenario like this flood Iceberg metadata, e.g. > metadata.json with all this uncontrolled schema changes? > - Now we talk about schema changes, but in fact what the code does now is > way wider than that. It sends a table object to HMS to persist it. We have to > double check if the current approach only persists schema changes or could do > any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property > and the last DDL time too. Could it change anything else as well that we > might miss with this approach? > 2. Do not do any alter_tables after loading the Iceberg table > This approach would simply drop the code [after this > line|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463] > and won't do any HMS schema changes. Impala then internally could use the > adjusted column types, but won't change the types of the columns in HMS. The > question here is if this would break any use cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)