Zoltán Borók-Nagy created IMPALA-13445:
------------------------------------------
Summary: Cost-based planning can reduce writer parallelism too much
Key: IMPALA-13445
URL: https://issues.apache.org/jira/browse/IMPALA-13445
Project: IMPALA
Issue Type: Bug
Reporter: Zoltán Borók-Nagy
When Cost-based planning is used, writer parallelism is limited by number of
partitions:
[https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L260]
E.g. this means unpartitioned tables are always written by a single writer.
This is beneficial to avoid small files problems, but in some cases it can
cause a serious performance degradation when the amount of data need to be
written is large.
Number of partitions shouldn't always limit the number of writers. It would be
nice to come up with a distribution that allows more writers than partitions
and records aren't just randomly shuffled across all writers, but have some
affinity to a set of writers based on their partitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)