Zoltán Borók-Nagy created IMPALA-13445:
------------------------------------------

             Summary: Cost-based planning can reduce writer parallelism too much
                 Key: IMPALA-13445
                 URL: https://issues.apache.org/jira/browse/IMPALA-13445
             Project: IMPALA
          Issue Type: Bug
            Reporter: Zoltán Borók-Nagy


When Cost-based planning is used, writer parallelism is limited  by number of 
partitions:

[https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L260]

E.g. this means unpartitioned tables are always written by a single writer.

This is beneficial to avoid small files problems, but in some cases it can  
cause a serious performance degradation when the amount of data need to be 
written is large.

Number of partitions shouldn't always limit the number of writers. It would be 
nice to come up with a distribution that allows more writers than partitions 
and records aren't just randomly shuffled across all writers, but have some 
affinity to a set of writers based on their partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to