[jira] [Updated] (DRILL-1762) Apply different levels of parallelization to different phases of the query

Jacques Nadeau (JIRA) Sun, 04 Jan 2015 15:58:10 -0800

     [ 
https://issues.apache.org/jira/browse/DRILL-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jacques Nadeau updated DRILL-1762:
----------------------------------
    Fix Version/s:     (was: 0.8.0)
                   Future

> Apply different levels of parallelization to different phases of the query 
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1762
>                 URL: https://issues.apache.org/jira/browse/DRILL-1762
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.6.0
>         Environment: 10-node Drill cluster with TPCH schema (SF100)
> OS: RHEL 6.4
> cores/node: 32
> RAM: 256GB
>            Reporter: Kunal Khatua
>             Fix For: Future
>
>
> When running TPCH queries, we found that setting the 
> planner.max.width_per_node to the maximum number of cores available didn't 
> necessarily improve their runtimes. In almost all queries, setting the 
> property to as low as 12 (on eacch 32-core node), resulted in faster 
> runtimes... which is counter-intuitive.
> It is highly possible that the leaf fragments have a larger overhead than 
> many other fragments due to I/O operations, and the high level of 
> parallelization might be causing the entire phase of data scan to be 
> sub-optimal. Based on further investigation, qe might want to split this 
> property  in two... one for leaf fragments (SCANs) and one for the rest of 
> the class of fragments.
> e.g TPCH 08 (SF100)
> MaxWidth      Runtime (msec)
> 12            34,650
> 16            35,216
> 20            39,482
> 24            41,694
> 28            46,579
> 32            57,207



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-1762) Apply different levels of parallelization to different phases of the query

Reply via email to