[ 
https://issues.apache.org/jira/browse/IMPALA-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-13484.
-----------------------------------
    Fix Version/s: Impala 4.5.0
         Assignee: Gabor Kaszab
       Resolution: Fixed

> Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss
> ----------------------------------------------------------------
>
>                 Key: IMPALA-13484
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13484
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.5.0
>
>
> *+Repro steps:+*
> 1. Create a table with Hive that has a TS_LTZ column:
> {code:java}
> create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) 
> stored by iceberg;
> {code}
> 2. Insert some data using Hive:
> {code:java}
> insert into ice_hive_tbl values (1, current_timestamp());
> {code}
> 3. Add a breakpoint in Impala to the table loading code right before Impala 
> sends out an alter_table to HMS to change the column type from TS_LTZ to TS. 
> [Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463],
>  for instance.
> 4. Query the table from Impala. This triggers a table load. Impala will come 
> to a decision that it should change the TS_LTZ type of a column to TS. 
> However, the break point will hold it doing this at this point.
> 5. Use Hive to add additional rows into the table:
> {code:java}
> insert into ice_hive_tbl values (2, current_timestamp());
> insert into ice_hive_tbl values (3, current_timestamp());
> {code}
> 6. Release the breakpoint and let Impala finish the SELECT query started at 4)
> 7. Do another SELECT * from Hive and/or Impala and verify that the extra rows 
> added at 5) are not present in the table.
> *+Root cause:+*
> When Impala changes the TS_LTZ column to TS it does so by calling 
> alter_table() on HMS directly. It gives a Metastore Table object to HMS as 
> the desired state of the table. HMS then persists this table object.
> The problem with this:
>  - Impala doesn't use the Iceberg API to alter the table. As a result there 
> is no conflict detection performed, and it won't be detected that another 
> commits went into the table since Impala grabbed a table object from HMS.
>  - The metadata.json path is part of the table properties, and when Impala 
> calls alter_table(tbl_obj) HMS will also persist this metadata path to the 
> table, even though there were other changes that already moved the metadata 
> path forward.
>  - Essentially this will revert the changes on the table back to the state 
> when Impala loaded the table object from HMS.
>  - In a high-frequency scenario this could cause problems when Hive (or even 
> Spark) heavily writes the table and meanwhile Impala reads this table. Some 
> snapshots could be unintentionally reverted by this behavior and as a result 
> could cause data loss or any changes like deletes being reverted.
> {+}Just a note{+}, FWIW, with the current approach Impala doesn't change the 
> column types in the Iceberg metadata, it does change the column types in HMS. 
> So even with this, the Iceberg metadata would show the column type as 
> timestamptz.
> {+}Note2{+}, I described this problem using timestamp with local time zone as 
> an example but it could also be triggered by other column types not entirely 
> compatible with Impala. I haven't made my research to find out if there is 
> any other such type, though.
> {+}Note3{+}, this issue seems to be there forever. I found the code that 
> triggers this being added by one of the first changes wrt Iceberg 
> integration, the "[Create Iceberg 
> table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]";
>  change.
> *+Possible solutions:+*
> 1. Impala can do the alter table by calling the Iceberg API and not HMS 
> directly.
> There are thing to be careful about:
>  - With this approach would the above repro steps make the table loading fail 
> due to conflict between commits on the tables? Or could the schema change be 
> merged automatically be Iceberg lib to the latest state even if there had 
> been changes on the table? I think this would work as expected and won't 
> reject loading the table, but we should make sure when testing this.
>  - With this approach Impala would set the TS_LTZ cols to TS properly causing 
> no snapshots to be lost. However, when a new write is performed by 
> Hive/Spark, they'd set the col types back to TS_LTZ. And then when Impala 
> reads the table again, it will set these cols to TS again. And so on. 
> Question is, would a scenario like this flood Iceberg metadata, e.g. 
> metadata.json with all this uncontrolled schema changes?
>  - Now we talk about schema changes, but in fact what the code does now is 
> way wider than that. It sends a table object to HMS to persist it. We have to 
> double check if the current approach only persists schema changes or could do 
> any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property 
> and the last DDL time too. Could it change anything else as well that we 
> might miss with this approach?
> 2. Do not do any alter_tables after loading the Iceberg table
> This approach would simply drop the code [after this 
> line|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463]
>  and won't do any HMS schema changes. Impala then internally could use the 
> adjusted column types, but won't change the types of the columns in HMS. The 
> question here is if this would break any use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to