[ 
https://issues.apache.org/jira/browse/IMPALA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-12477.
----------------------------------------
    Resolution: Won't Fix

After further analysis there is actually no difference between the for-loop and 
the forEach().

> Make Iceberg planFiles() use multiple threads
> ---------------------------------------------
>
>                 Key: IMPALA-12477
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12477
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg, performance
>
> Impala is not using Iceberg’s planFiles() API in a performant way:
> [https://github.com/apache/impala/blob/2d3289027c2ffdd245d13b60e6fa3f9b3e7bf833/fe/[…]java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java|https://github.com/apache/impala/blob/2d3289027c2ffdd245d13b60e6fa3f9b3e7bf833/fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java#L46]
> Instead of a for-loop we should use a forEach() like Hive does:
> [https://github.com/apache/hive/blob/071b721d8d73cc4d5d2d9469d7953bdc75ff615f/icebe[…]in/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java|https://github.com/apache/hive/blob/071b721d8d73cc4d5d2d9469d7953bdc75ff615f/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L222]
> The forEach() spreads the work across multiple threads.
> This will not just improve table loading  times, but also improves queries 
> that use planFiles(), e.g. queries that push down predicates to Iceberg and 
> time-travel queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to