[jira] [Commented] (DRILL-5758) Rollup of external sort fixes to issues found by QA

Paul Rogers (JIRA) Thu, 31 Aug 2017 17:19:16 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149825#comment-16149825
 ]


Paul Rogers commented on DRILL-5758:
------------------------------------

Turns out the {{RecordBatchSizer}} contained a bug for repeated elements. 
Consider the original output:

{code}
  rms.mapvalue.col2(type: REPEATED BIGINT, count: 1, total entries: 1, 
per-array: 1, std size: 8, actual size: 52, data size: 52)
...
  Records: 4096, Total size: 1441792, Data size: 376615, Gross row width: 352, 
Net row width: 92, Density: 27}
{code}

In the above, {{col2}} is repeated, but the entries per array is set at 1.

Output after the fix:

{code}
  rms.mapvalue.col2(type: REPEATED BIGINT, count: 4096, elements: 12288, 
per-array: 3, std size: 8, actual size: 28, data size: 114688)
...
  Records: 4096, Total size: 1441792, Data size: 1136848, Gross row width: 352, 
Net row width: 278, Density: 79}
{code}

Note that the (average) elements per-array is now 3 and the estimated "net" row 
width has grown from 92 to 278.

The result is much better vector size estimates and no vector reallocations:

{code}
Initial output batch allocation: 811008 bytes, 3771 records
<Note no vector resizes here.>
Took 4438 us to merge 3771 records, consuming 811008 bytes of memory
{code}

And now the sort completes:
{code}
Results: 4,000,000 records, 63 batches
{code}

> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>
>                 Key: DRILL-5758
>                 URL: https://issues.apache.org/jira/browse/DRILL-5758
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Tracking JIRA to used for the PR that combines fixes for various JIRA 
> entries. Bugs fixed in this task are given by the linked issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5758) Rollup of external sort fixes to issues found by QA

Reply via email to