[ 
https://issues.apache.org/jira/browse/IMPALA-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058096#comment-18058096
 ] 

ASF subversion and git services commented on IMPALA-14734:
----------------------------------------------------------

Commit 26e3529c95dd63e50e9f59a9871084dccb28d868 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=26e3529c9 ]

IMPALA-14734: Optimize sorting file descriptors during planning

IcebergScanNode sorts the file descriptors by path (IMPALA-12765).
This can dominate planning time if there are many files.

This change makes this faster by avoiding extracting Java
Strings from flatbuffer, which involves utf8 decoding. Also
changes a few similar functions to avoid duplicate decoding.

For a table with ~1 million files:
explain select * from bigice limit 1;
before: ~12s
after: ~6.5s

Change-Id: Icb914eb4de7bdadeb876f7dd101e8737b9527b6f
Reviewed-on: http://gerrit.cloudera.org:8080/23958
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Csaba Ringhofer <[email protected]>


> Planning on large iceberg tables can be dominated by sorting
> ------------------------------------------------------------
>
>                 Key: IMPALA-14734
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14734
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: iceberg, performance
>
> Noticed that planning on large Iceberg tables can be faster when using 
> Iceberg's plan files compared to Impala's "optimized" path using cached file 
> descriptors. The reason seems to be that planning time is dominated by 
> sorting file descriptors, which decodes utf8 pathes in the backing flat 
> buffer structure n log ( n ) times 
> https://github.com/apache/impala/blob/3be15fd3598071eaeddd9b4d29e0883b95fdd14a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java#L116



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to