[
https://issues.apache.org/jira/browse/IMPALA-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-12120.
-----------------------------------
Fix Version/s: Impala 4.3.0
Resolution: Fixed
> Set appropriate output writer parallelism when using new processing cost
> planner
> --------------------------------------------------------------------------------
>
> Key: IMPALA-12120
> URL: https://issues.apache.org/jira/browse/IMPALA-12120
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: David Rorke
> Assignee: Riza Suminto
> Priority: Major
> Fix For: Impala 4.3.0
>
>
> The new processing cost based planner changes (IMPALA-11604, IMPALA-12091)
> will impact output writer parallelism for insert queries, with the potential
> for more small files if the processing cost based planning results in too
> many writer fragments. This could further exacerbate a problem that was
> introduced with mt_dop (see IMPALA-8125).
> There are 2 cases to consider:
> # Unpartitioned inserts where the output writer is in the same fragment as
> the scan. In this case the output parallelism will be determined by the scan
> parallelism which may increase (vs mt_dop) with the changes in IMPALA-12091.
> # Partitioned inserts where the output writer fragment typically consists of
> a sort followed by the writer, and the parallelism under IMPALA-11604 is
> driven by the estimated sort cost. Again we have the potential to
> overparallelize resulting in too many small files.
> The MAX_FS_WRITERS query option (IMPALA-8125) can help mitigate this but we
> should have better default behavior even when MAX_FS_WRITERS isn't set. The
> default output writer parallelism with no query options set should avoid
> creating excessive writer parallelism for both partitioned and unpartitioned
> inserts. We could also consider always including an exchange (even in the
> unpartitioned case) to decouple the writer from the scan parallelism.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]