[
https://issues.apache.org/jira/browse/IMPALA-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on IMPALA-11293 started by Tamas Mate.
-------------------------------------------
> Add COMPACT command for Iceberg tables
> --------------------------------------
>
> Key: IMPALA-11293
> URL: https://issues.apache.org/jira/browse/IMPALA-11293
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Zoltán Borók-Nagy
> Assignee: Tamas Mate
> Priority: Major
> Labels: impala-iceberg
>
> Currently Impala cannot compact Iceberg tables.
> The following INSERT OVERWRITE statement could be used in the simple cases,
> i.e. when the following conditions meet:
> * all data files use the same partition spec (i.e. no partition evolution)
> * no bucket partitioning (we currently forbid INSERT OVERWRITE for bucket
> partitioning)
> {noformat}
> INSERT OVERWRITE t SELECT * FROM t;{noformat}
> We could have a command that compacts the Iceberg table (syntax needs to be
> the same with Hive), e.g.:
> {noformat}
> ALTER TABLE t EXECUTE compaction();{noformat}
> At first, the compact command could be just rewritten to the INSERT OVERWRITE
> command, but it would also check that there's no partition evolution.
> The "no bucket" partitioning condition could be relaxed in this case, because
> the result would be deterministic. I.e. the only condition we need to check
> is that there was no partition evolution.
> Later, we could do compaction by
> {noformat}
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t FOR SYSTEM_TIME AS OF ...;{noformat}
> Currently time-travel queries are not optimized, but we could workaround it
> by doing planning at first of:
> {noformat}
> Create the plan for:
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t;{noformat}
> Then execute them:
> {noformat}
> Actually execute:
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t; (no need for time-travel, plan was created
> before TRUNCATE){noformat}
> This could workaround the planning overhead of time-travel queries.
> Also, we might add some locking for the table if possible.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]