[
https://issues.apache.org/jira/browse/IMPALA-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057867#comment-18057867
]
Csaba Ringhofer commented on IMPALA-14734:
------------------------------------------
uploaded https://gerrit.cloudera.org/#/c/23958/ which helps nearly 50% on my
test case
an interesting observation was that the same sorting happens after using
Iceberg's planFiles, but it is faster - it seems that plan files already
returns files somewhat sorted (by partition?), which is probably the reason
behind being faster
> Planning on large iceberg tables can be dominated by sorting
> ------------------------------------------------------------
>
> Key: IMPALA-14734
> URL: https://issues.apache.org/jira/browse/IMPALA-14734
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Csaba Ringhofer
> Priority: Major
> Labels: iceberg, performance
>
> Noticed that planning on large Iceberg tables can be faster when using
> Iceberg's plan files compared to Impala's "optimized" path using cached file
> descriptors. The reason seems to be that planning time is dominated by
> sorting file descriptors, which decodes utf8 pathes in the backing flat
> buffer structure n log ( n ) times
> https://github.com/apache/impala/blob/3be15fd3598071eaeddd9b4d29e0883b95fdd14a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java#L116
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]