[ 
https://issues.apache.org/jira/browse/IMPALA-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13445.
-----------------------------------
     Fix Version/s: Impala 4.5.0
    Target Version: Impala 4.5.0
        Resolution: Fixed

> Cost-based planning can reduce writer parallelism too much
> ----------------------------------------------------------
>
>                 Key: IMPALA-13445
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13445
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Riza Suminto
>            Priority: Critical
>             Fix For: Impala 4.5.0
>
>
> When Cost-based planning is used, writer parallelism is limited  by number of 
> partitions:
> [https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L260]
> E.g. this means unpartitioned tables are always written by a single writer.
> This is beneficial to avoid small files problems, but in some cases it can  
> cause a serious performance degradation when the amount of data need to be 
> written is large.
> Number of partitions shouldn't always limit the number of writers. It would 
> be nice to come up with a distribution that allows more writers than 
> partitions and records aren't just randomly shuffled across all writers, but 
> have some affinity to a set of writers based on their partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to