[ 
https://issues.apache.org/jira/browse/IMPALA-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770187#comment-17770187
 ] 

ASF subversion and git services commented on IMPALA-12293:
----------------------------------------------------------

Commit 2d3289027c2ffdd245d13b60e6fa3f9b3e7bf833 in impala's branch 
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2d3289027 ]

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.

Syntax: OPTIMIZE TABLE <table_name>;

This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Reviewed-on: http://gerrit.cloudera.org:8080/20405
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> OPTIMIZE statement to Compact Iceberg Tables
> --------------------------------------------
>
>                 Key: IMPALA-12293
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12293
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>            Reporter: Noemi Pap-Takacs
>            Assignee: Noemi Pap-Takacs
>            Priority: Major
>              Labels: impala-iceberg
>
> A simple syntax to compact Iceberg tables. It executes the following tasks:
>  * compact small files
>  * rewrite partitions according to latest spec
>  * merge delete deltas
> {code:java}
> Syntax:
> OPTIMIZE TABLE <table_name>
> [ REWRITE DATA ]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = <value> [, ... ] ) ]
> [ WHERE <condition> ];{code}
> Limitations - OPTIMIZE TABLE can not be executed on the following tables:
>  * Non-Iceberg tables.
>  * Tables with complex types columns. Currently, Impala does not support 
> writing complex types.
>  * If the 'write.format.default' is not Parquet. Impala can only write 
> Parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to