Riza Suminto created IMPALA-14006:
-------------------------------------
Summary: Scheduler::CreateInputCollocatedInstances may
overparallelize
Key: IMPALA-14006
URL: https://issues.apache.org/jira/browse/IMPALA-14006
Project: IMPALA
Issue Type: Bug
Components: Distributed Exec
Affects Versions: Impala 5.0.0
Reporter: Riza Suminto
Attachments: profile_614699efbc5e2ed9_cb9d6aca00000000.txt
IMPALA-11604 (part 2) change how many instances to create in
Scheduler::CreateInputCollocatedInstances.
[https://github.com/apache/impala/blame/3c24706c72818a1668159a428d4f2afcadea9f27/be/src/scheduling/scheduler.cc#L915-L920]
This work when left child fragment of a parent fragment is distributed across
nodes. However, there is a possible corner case where the left child fragment
instance is limited to only 1 node, like SELECT fragment in
[^profile_614699efbc5e2ed9_cb9d6aca00000000.txt]
{noformat}
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
Per-Instance Resources: mem-estimate=4.27MB mem-reservation=4.00MB
thread-reservation=1
max-parallelism=1 segment-costs=[17820]
04:SELECT
| predicates: dense_rank() < CAST(10 AS BIGINT)
| mem-estimate=0B mem-reservation=0B thread-reservation=0
| tuple-ids=7,6 row-size=12B cardinality=730 cost=7300
| in pipelines: 02(GETNEXT)
|
03:ANALYTIC
| functions: dense_rank()
| order by: id ASC
| window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0
| tuple-ids=7,6 row-size=12B cardinality=7.30K cost=7300
| in pipelines: 02(GETNEXT)
|
06:MERGING-EXCHANGE [UNPARTITIONED]
| order by: id ASC
| mem-estimate=33.50KB mem-reservation=0B thread-reservation=0
| tuple-ids=7 row-size=4B cardinality=7.30K cost=1904
| in pipelines: 02(GETNEXT){noformat}
Due to existence of MERGING-EXCHANGE, this fragment is limited to just 1
instance in 1 node. If its parent fragment has high cost, the parent fragment
might schedule hundreds of instances, but
Scheduler::CreateInputCollocatedInstances will force them to colocate in single
executor node where the SELECT fragment scheduled.
This, in turn, will hit sanity check in Scheduler::CheckEffectiveInstanceCount,
which enforce that no fragment should be parallelized over 128 instances per
executor node.
[https://github.com/apache/impala/blob/3c24706c72818a1668159a428d4f2afcadea9f27/be/src/scheduling/scheduler.cc#L456-L463]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)