[jira] [Commented] (DRILL-5885) Drill consumes 2x memory when sorting and reading a spilled batch from disk.

Robert Hou (JIRA) Tue, 17 Oct 2017 15:57:18 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208545#comment-16208545
 ]


Robert Hou commented on DRILL-5885:
-----------------------------------

>From the notes in DRILL-5670.
Further investigation. When spilling, we get these log entries:
{noformat}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{noformat}
The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.
>From the file summary:
{noformat}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{noformat}
>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.

> Drill consumes 2x memory when sorting and reading a spilled batch from disk.
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5885
>                 URL: https://issues.apache.org/jira/browse/DRILL-5885
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Robert Hou
>
> The query is:
> {noformat}
> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
>  
> columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[2222],columns[30],columns[2420],columns[1520],
>  columns[1410], 
> columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
>  
> columns[3333],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
>  columns[3210] ) d where d.col433 = 'sjka skjf';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5885) Drill consumes 2x memory when sorting and reading a spilled batch from disk.

Reply via email to