Gabor Kaszab created IMPALA-12251:
-------------------------------------

             Summary: Table migration to run on multiple partitions in parallel
                 Key: IMPALA-12251
                 URL: https://issues.apache.org/jira/browse/IMPALA-12251
             Project: IMPALA
          Issue Type: New Feature
          Components: Frontend
            Reporter: Gabor Kaszab


https://issues.apache.org/jira/browse/IMPALA-11013 Introduces table migration 
from legacy Hive tables to Iceberg tables. The parallelization in this patch is 
based on files within a partition. But if there are a lot of partitions and 
only few files in them this approach is not performant.

Instead, as an improvement we can implement the parallelisation based on 
partitions and then decide which one to used based on a # partitions / avg # of 
files in a partition ratio.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to