[jira] [Comment Edited] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

Paul Rogers (JIRA) Fri, 24 Feb 2017 12:36:06 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883418#comment-15883418
 ]


Paul Rogers edited comment on DRILL-5294 at 2/24/17 8:35 PM:
-------------------------------------------------------------

Original test case works fine with latest code. Tested with the long query 
using a single slice (all that can be done on the Mac) and 2 GB sort memory.

{code}
Results: 0 records, 1 batches, 208,341 ms
{code}

Tested with an adaptation of the second query using the 18 GB "250wide.tbl" 
file:

{code}
select * from (select * from `dfs.data`.`250wide.tbl` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 356,243 ms
{code}

The second use case completes, but is slow because it does a binary merge: 
merging two batches, then spilling and repeating until only two runs remain:

{code}
select * from (select * from `dfs.data`.`250wide-small.tbl` order by 
columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'
Results: 0 records, 1 batches, 26,753 ms
{code}

The third case also succeeds:

{code}
select * from (select * from `dfs.data`.`250wide_files` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 9,987 ms
{code}

One minor fix was found, will be pushed to the Sort-Rollup branch and included 
in the DRILL-5284 PR.


was (Author: paul-rogers):
Original test case works fine with latest code. Tested with the long query 
using a single slice (all that can be done on the Mac) and 2 GB sort memory.

{code}
Results: 0 records, 1 batches, 208,341 ms
{code}

Tested with an adaptation of the second query using the 18 GB "250wide.tbl" 
file:

{code}
select * from (select * from `dfs.data`.`250wide.tbl` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 356,243 ms
{code}

The second use case completes, but is slow because it does a binary merge: 
merging two batches, then spilling and repeating until only two runs remain:

{code}
select * from (select * from `dfs.data`.`250wide-small.tbl` order by 
columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'
Results: 0 records, 1 batches, 26,753 ms
{code}



> Managed External Sort throws an OOM during the merge and spill phase
> --------------------------------------------------------------------
>
>                 Key: DRILL-5294
>                 URL: https://issues.apache.org/jira/browse/DRILL-5294
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>         Attachments: 2751ce6d-67e6-ae08-3b68-e33b29f9d2a3.sys.drill, 
> drillbit.log, drillbit_scenario2.log, drillbit_scenario3.log, 
> scenario2_profile.sys.drill, scenario3_profile.sys.drill
>
>
> commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38
> The below query fails with managed sort while it succeeds on the old sort
> {code}
> select * from (select columns[433] col433, columns[0], 
> columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11]
>  from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50])
>  d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Fragment 1:11
> [Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Env : 
> {code}
> No of Drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Attached the logs and profile. Data is too large for a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

Reply via email to