[
https://issues.apache.org/jira/browse/IMPALA-12412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822286#comment-17822286
]
ASF subversion and git services commented on IMPALA-12412:
----------------------------------------------------------
Commit 47db4fd1f5793ed1e0aaf6004496bf51da8209c6 in impala's branch
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=47db4fd1f ]
IMPALA-12412: Support partition evolution in OPTIMIZE statement
The OPTIMIZE statement is used to execute table maintenance tasks
on Iceberg tables, such as:
1. compacting small files,
2. merging delete deltas,
3. rewriting the table according to the latest schema
and partition spec.
OptimizeStmt used to serve as an alias for INSERT OVERWRITE.
After this change it works as follows: It creates a source statement
that contains all columns of the table. All table content will be
rewritten to new data files. After the executors finished writing,
the Catalog calls RewriteFiles Iceberg API to commit the changes.
All previous data and delete files will be excluded from,
and all newly written data files will be added to the next
snapshot. The old files remain accessible via time travel
to older snapshots of the table.
By default, Impala has as many file writers as query fragment instances
and therefore can write too many files for unpartitioned tables.
For smaller tables this can be limited by setting the
MAX_FS_WRITERS Query Option.
Authorization: OPTIMIZE TABLE requires ALL privileges.
Limitations:
All limitations about writing Iceberg tables apply.
Testing:
- E2E tests:
- schema evolution
- partition evolution
- UPDATE/DELETE
- time travel
- table history
- negative tests
- Ranger tests for authorization
- FE: Planner test:
- sorting order
- MAX_FS_WRITERS
- partitioned exchange
- Parser test
Change-Id: I65a0c8529d274afff38ccd582f1b8a857716b1b5
Reviewed-on: http://gerrit.cloudera.org:8080/20866
Reviewed-by: Daniel Becker <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Support partition evolution in OPTIMIZE statement
> -------------------------------------------------
>
> Key: IMPALA-12412
> URL: https://issues.apache.org/jira/browse/IMPALA-12412
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Noemi Pap-Takacs
> Assignee: Noemi Pap-Takacs
> Priority: Major
> Labels: impala-iceberg
>
> OPTIMIZE TABLE statement currently uses INSERT OVERWRITE to rewrite Iceberg
> tables. Therefore it inherits its limitations as well, such as the inability
> to rewrite tables with partition evolution.
> This change aims to increase the support OPTIMIZE TABLE provides, by making
> it independent from INSERT statment. After the refactoring, OPTIMIZE TABLEĀ
> statement will be able to:
> * rewrite all files in Iceberg tables according to the latest partition spec
> * compact tables with partition evolution
> This change also serves as a base for further improvements.
> {code:java}
> Syntax: OPTIMIZE TABLE <table_name>;{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]