ASF GitHub Bot commented on DRILL-6115:

GitHub user HanumathRao opened a pull request:


    DRILL-6115: SingleMergeExchange is not scaling up when many minor fra…

    …gments are allocated for a query.
    Currently a singlemerge exchange is merging all the fragment streams on 
foreman. This can cause cpu bottleneck and also huge memory consumption at the 
    This PR contains changes to introduce a new Multiplex Operator called 
OrderedMuxExchange which merges the minor fragment streams pertaining to one 
drillbit and send as one output stream to the foreman. 
    The existing multiplex mechanism is used to introduce these operators.
    Please review this PR.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HanumathRao/drill DRILL-6115

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1110
commit 43a71277aeec9bb377181728b2ce563437d7e46d
Author: hmaduri <hmaduri@...>
Date:   2018-01-22T00:42:28Z

    DRILL-6115: SingleMergeExchange is not scaling up when many minor fragments 
are allocated for a query.


> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> ------------------------------------------------------------------------------------------
>                 Key: DRILL-6115
>                 URL: https://issues.apache.org/jira/browse/DRILL-6115
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.12.0
>            Reporter: Hanumath Rao Maduri
>            Assignee: Hanumath Rao Maduri
>            Priority: Major
>             Fix For: 1.13.0
>         Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.

This message was sent by Atlassian JIRA

Reply via email to