[
https://issues.apache.org/jira/browse/IMPALA-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noemi Pap-Takacs updated IMPALA-12406:
--------------------------------------
Description:
If an Iceberg table is frequently updated/written to in small batches, a lot of
small files are created. This fragmentation decreases read performance.
Similarly, frequent row-level deletes contribute to this problem by creating
delete files which have to be merged on read.
Currently INSERT OVERWRITE is used as a workaround to rewrite and compact
Iceberg tables.
The OPTIMIZE statement offers a new syntax and an Iceberg specific solution to
this problem.
This first subtask introduces the new syntax, temporarily as an alias for
INSERT OVERWRITE.
{code:java}
Syntax: OPTIMIZE TABLE <table_name>;{code}
Limitations - OPTIMIZE TABLE can not be executed on the following tables:
* Tables with partition evolution
* Tables with complex types columns
* Non-Iceberg tables
was:
If an Iceberg table is frequently updated/written to in small batches, a lot of
small files are created. This decreases read performance. Similarly, frequent
row-level deletes contribute to this problem by creating delete files which
have to be merged on read.
Currently INSERT OVERWRITE is used as a workaround to rewrite and compact
Iceberg tables.
OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this
problem.
This patch introduces the new syntax as an alias for INSERT OVERWRITE.
{code:java}
Syntax: OPTIMIZE TABLE <table_name>;{code}
> OPTIMIZE statement as an alias for INSERT OVERWRITE
> ---------------------------------------------------
>
> Key: IMPALA-12406
> URL: https://issues.apache.org/jira/browse/IMPALA-12406
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Noemi Pap-Takacs
> Assignee: Noemi Pap-Takacs
> Priority: Major
> Labels: impala-iceberg
>
> If an Iceberg table is frequently updated/written to in small batches, a lot
> of small files are created. This fragmentation decreases read performance.
> Similarly, frequent row-level deletes contribute to this problem by creating
> delete files which have to be merged on read.
> Currently INSERT OVERWRITE is used as a workaround to rewrite and compact
> Iceberg tables.
> The OPTIMIZE statement offers a new syntax and an Iceberg specific solution
> to this problem.
> This first subtask introduces the new syntax, temporarily as an alias for
> INSERT OVERWRITE.
> {code:java}
> Syntax: OPTIMIZE TABLE <table_name>;{code}
> Limitations - OPTIMIZE TABLE can not be executed on the following tables:
> * Tables with partition evolution
> * Tables with complex types columns
> * Non-Iceberg tables
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]