[
https://issues.apache.org/jira/browse/DRILL-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208546#comment-16208546
]
Robert Hou commented on DRILL-5885:
-----------------------------------
>From the notes in DRILL-5670:
The OOM occurs during the merge phase:
{noformat}
Completed spill: memory = 0
Starting merge phase. Runs = 62, Alloc. memory = 0
Read 100 records in 73169 us; size = 8480768, memory = 8481024
...
Read 100 records in 81261 us; size = 8480768, memory = 525807872
{noformat}
Here the "Read 100 records" indicates the sort is loading the first batch of
each of 62 spill files. We see that the first spilled batch was 4,736,187 when
written (previous comment), but requires 8,481,024 bytes when read. This is
larger than the estimate of 7,215,150 that the calcs estimated. The average
load size is 8,480,772 bytes. The 1,265,622 byte delta per batch adds up to a
78,468,572 byte error over the 62 batches.
> Drill consumes 2x memory when sorting and reading a spilled batch from disk.
> ----------------------------------------------------------------------------
>
> Key: DRILL-5885
> URL: https://issues.apache.org/jira/browse/DRILL-5885
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.11.0
> Reporter: Robert Hou
>
> The query is:
> {noformat}
> select count(*) from (select * from
> dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
>
> columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[2222],columns[30],columns[2420],columns[1520],
> columns[1410],
> columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
>
> columns[3333],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
> columns[3210] ) d where d.col433 = 'sjka skjf';
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)