[ 
https://issues.apache.org/jira/browse/IMPALA-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11293 started by Tamas Mate.
-------------------------------------------
> Add COMPACT command for Iceberg tables
> --------------------------------------
>
>                 Key: IMPALA-11293
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11293
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently Impala cannot compact Iceberg tables.
> The following INSERT OVERWRITE statement could be used in the simple cases, 
> i.e. when the following conditions meet:
>  * all data files use the same partition spec (i.e. no partition evolution)
>  * no bucket partitioning (we currently forbid INSERT OVERWRITE for bucket 
> partitioning)
> {noformat}
> INSERT OVERWRITE t SELECT * FROM t;{noformat}
> We could have a command that compacts the Iceberg table (syntax needs to be 
> the same with Hive), e.g.:
> {noformat}
> ALTER TABLE t EXECUTE compaction();{noformat}
> At first, the compact command could be just rewritten to the INSERT OVERWRITE 
> command, but it would also check that there's no partition evolution.
> The "no bucket" partitioning condition could be relaxed in this case, because 
> the result would be deterministic. I.e. the only condition we need to check 
> is that there was no partition evolution.
> Later, we could do compaction by
> {noformat}
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t FOR SYSTEM_TIME AS OF ...;{noformat}
> Currently time-travel queries are not optimized, but we could workaround it 
> by doing planning at first of:
> {noformat}
> Create the plan for:
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t;{noformat}
> Then execute them:
> {noformat}
> Actually execute:
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t; (no need for time-travel, plan was created 
> before TRUNCATE){noformat}
> This could workaround the planning overhead of time-travel queries.
> Also, we might add some locking for the table if possible.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to