[ 
https://issues.apache.org/jira/browse/IMPALA-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-12120.
-----------------------------------
    Fix Version/s: Impala 4.3.0
       Resolution: Fixed

> Set appropriate output writer parallelism when using new processing cost 
> planner
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-12120
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12120
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: David Rorke
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 4.3.0
>
>
> The new processing cost based planner changes (IMPALA-11604, IMPALA-12091) 
> will impact output writer parallelism for insert queries, with the potential 
> for more small files if the processing cost based planning results in too 
> many writer fragments.  This could further exacerbate a problem that was 
> introduced with mt_dop (see IMPALA-8125). 
> There are 2 cases to consider:
>  # Unpartitioned inserts where the output writer is in the same fragment as 
> the scan.  In this case the output parallelism will be determined by the scan 
> parallelism which may increase (vs mt_dop) with the changes in IMPALA-12091.
>  # Partitioned inserts where the output writer fragment typically consists of 
> a sort followed by the writer, and the parallelism under IMPALA-11604 is 
> driven by the estimated sort cost.  Again we have the potential to 
> overparallelize resulting in too many small files.
> The MAX_FS_WRITERS query option (IMPALA-8125) can help mitigate this but we 
> should have better default behavior even when MAX_FS_WRITERS isn't set.  The 
> default output writer parallelism with no query options set should avoid 
> creating excessive writer parallelism for both partitioned and unpartitioned 
> inserts.  We could also consider always including an exchange (even in the 
> unpartitioned case) to decouple the writer from the scan parallelism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to