[ 
https://issues.apache.org/jira/browse/DRILL-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933846#comment-15933846
 ] 

Kunal Khatua commented on DRILL-5267:
-------------------------------------

[~paul-rogers] Does [~rkins] need to define tests for this specifically? How do 
we verify that the issue is fixed? The fix appears to be from DRILL-5266 's PR. 

> Managed external sort spills too often with Parquet data
> --------------------------------------------------------
>
>                 Key: DRILL-5267
>                 URL: https://issues.apache.org/jira/browse/DRILL-5267
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> DRILL-5266 describes how Parquet produces low-density record batches. The 
> result of these batches is that the external sort spills more frequently than 
> it should because it sizes spill files based on batch size, not data content 
> of the batch. Since Parquet batches are 95% empty space, the spill files end 
> up far too small.
> Adjust the spill calculations based on actual data content, not the size of 
> the overall record batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to