[
https://issues.apache.org/jira/browse/IMPALA-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608441#comment-17608441
]
ASF subversion and git services commented on IMPALA-11590:
----------------------------------------------------------
Commit 3f382b7ebbd66a5a02270e14ff493bd9607c0b94 in impala's branch
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3f382b7eb ]
IMPALA-11583: Use Iceberg API to update stats
Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.
In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.
COMPUTE INCREMENTAL STATS without partition clause is the same as
plain COMPUTE STATS for Iceberg tables. This behavior is aligned
with current behavior on non-partitioned tables:
https://impala.apache.org/docs/build/html/topics/impala_compute_stats.html
COMPUTE INCREMENTAL STATS .. PARTITION raises an error.
DROP STATS has been also modified to not drop table-level stats for
HMS-integrated Iceberg tables.
Testing:
* added e2e tests for COMPUTE STATS
* added e2e tests for DROP STATS
* manually tested concurrent Hive INSERT and Impala COMPUTE STATS
using latest Hive
* opened IMPALA-11590 to add automated interop tests with Hive
Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Reviewed-on: http://gerrit.cloudera.org:8080/18995
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Add test for concurrent Hive INSERT and Impala COMPUTE STATS
> ------------------------------------------------------------
>
> Key: IMPALA-11590
> URL: https://issues.apache.org/jira/browse/IMPALA-11590
> Project: IMPALA
> Issue Type: Test
> Reporter: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
>
> COMPUTE STATS should use Iceberg API to update table properties.
> Otherwise it could inadvertently override concurrent modifications to an
> Iceberg table.
> Since IMPALA-11583 we are using Iceberg API to update stats, but we didn't
> have latest Hive that can deal with Iceberg tables, hence we could not add
> interop tests for the above scenario.
> Add tests for DROP STATS as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]