[ 
https://issues.apache.org/jira/browse/IMPALA-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894632#comment-17894632
 ] 

ASF subversion and git services commented on IMPALA-13484:
----------------------------------------------------------

Commit c53681a40d115aafa3d720a2bcef6b2dac492cad in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c53681a40 ]

IMPALA-13484: Don't call alter_table() on HMS when loading Iceberg table

When Impala loads an Iceberg table it also loads the metastore
representation from HMS. Additionally, when Impala loads the Iceberg
metadata of the table it construct another metastore representation of
that metadata. If these two metastore representations don't match,
Impala calls alter_table() on HMS to persist the differences.

In practice this behaviour is triggered for instance when one engine
creates a table with column types that Impala can't read and instead it
does an adjustment on the column type. E.g. if an engine creates a
'timestamp with local time zone' column, Impala will change the column
type in the HMS representation to 'timestamp' type.

There are some issues with this approach:
1: Impala calls HMS.alter_table() directly and doesn't change the table
   through the Iceberg API. As a result no conflict checks are
   performed. Since the metadata location is a simple table property
   this alter_table() call can overwrite the metadata pointer in case
   there had been other changes to the table since Impala started to
   load it. This can result in data correctness issues.
2: Even in use cases where Impala only reads a table, it can still
   perform table modifications that is a very weird behaviour.
3: With this approach Impala changes the table schema in HMS but doesn't
   change the Iceberg schema in the Iceberg metadata.

As a solution we can simply get rid of the logic that makes the
alter_table() call to HMS at the end of loading an Iceberg table. This
can avoid a lot of confusions and in fact persisiting the schema
adjustments Impala had done during table loading is not necessary.

Testing:
Ran the whole exhaustive test suite.

Change-Id: Icd5d1eee421f22d3853833dc571b14d4e9005ab3
Reviewed-on: http://gerrit.cloudera.org:8080/21993
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss
> ----------------------------------------------------------------
>
>                 Key: IMPALA-13484
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13484
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>
> *+Repro steps:+*
> 1. Create a table with Hive that has a TS_LTZ column:
> {code:java}
> create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) 
> stored by iceberg;
> {code}
> 2. Insert some data using Hive:
> {code:java}
> insert into ice_hive_tbl values (1, current_timestamp());
> {code}
> 3. Add a breakpoint in Impala to the table loading code right before Impala 
> sends out an alter_table to HMS to change the column type from TS_LTZ to TS. 
> [Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463],
>  for instance.
> 4. Query the table from Impala. This triggers a table load. Impala will come 
> to a decision that it should change the TS_LTZ type of a column to TS. 
> However, the break point will hold it doing this at this point.
> 5. Use Hive to add additional rows into the table:
> {code:java}
> insert into ice_hive_tbl values (2, current_timestamp());
> insert into ice_hive_tbl values (3, current_timestamp());
> {code}
> 6. Release the breakpoint and let Impala finish the SELECT query started at 4)
> 7. Do another SELECT * from Hive and/or Impala and verify that the extra rows 
> added at 5) are not present in the table.
> *+Root cause:+*
> When Impala changes the TS_LTZ column to TS it does so by calling 
> alter_table() on HMS directly. It gives a Metastore Table object to HMS as 
> the desired state of the table. HMS then persists this table object.
> The problem with this:
>  - Impala doesn't use the Iceberg API to alter the table. As a result there 
> is no conflict detection performed, and it won't be detected that another 
> commits went into the table since Impala grabbed a table object from HMS.
>  - The metadata.json path is part of the table properties, and when Impala 
> calls alter_table(tbl_obj) HMS will also persist this metadata path to the 
> table, even though there were other changes that already moved the metadata 
> path forward.
>  - Essentially this will revert the changes on the table back to the state 
> when Impala loaded the table object from HMS.
>  - In a high-frequency scenario this could cause problems when Hive (or even 
> Spark) heavily writes the table and meanwhile Impala reads this table. Some 
> snapshots could be unintentionally reverted by this behavior and as a result 
> could cause data loss or any changes like deletes being reverted.
> {+}Just a note{+}, FWIW, with the current approach Impala doesn't change the 
> column types in the Iceberg metadata, it does change the column types in HMS. 
> So even with this, the Iceberg metadata would show the column type as 
> timestamptz.
> {+}Note2{+}, I described this problem using timestamp with local time zone as 
> an example but it could also be triggered by other column types not entirely 
> compatible with Impala. I haven't made my research to find out if there is 
> any other such type, though.
> {+}Note3{+}, this issue seems to be there forever. I found the code that 
> triggers this being added by one of the first changes wrt Iceberg 
> integration, the "[Create Iceberg 
> table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]";
>  change.
> *+Possible solutions:+*
> 1. Impala can do the alter table by calling the Iceberg API and not HMS 
> directly.
> There are thing to be careful about:
>  - With this approach would the above repro steps make the table loading fail 
> due to conflict between commits on the tables? Or could the schema change be 
> merged automatically be Iceberg lib to the latest state even if there had 
> been changes on the table? I think this would work as expected and won't 
> reject loading the table, but we should make sure when testing this.
>  - With this approach Impala would set the TS_LTZ cols to TS properly causing 
> no snapshots to be lost. However, when a new write is performed by 
> Hive/Spark, they'd set the col types back to TS_LTZ. And then when Impala 
> reads the table again, it will set these cols to TS again. And so on. 
> Question is, would a scenario like this flood Iceberg metadata, e.g. 
> metadata.json with all this uncontrolled schema changes?
>  - Now we talk about schema changes, but in fact what the code does now is 
> way wider than that. It sends a table object to HMS to persist it. We have to 
> double check if the current approach only persists schema changes or could do 
> any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property 
> and the last DDL time too. Could it change anything else as well that we 
> might miss with this approach?
> 2. Do not do any alter_tables after loading the Iceberg table
> This approach would simply drop the code [after this 
> line|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463]
>  and won't do any HMS schema changes. Impala then internally could use the 
> adjusted column types, but won't change the types of the columns in HMS. The 
> question here is if this would break any use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to