[
https://issues.apache.org/jira/browse/DRILL-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933846#comment-15933846
]
Kunal Khatua commented on DRILL-5267:
-------------------------------------
[~paul-rogers] Does [~rkins] need to define tests for this specifically? How do
we verify that the issue is fixed? The fix appears to be from DRILL-5266 's PR.
> Managed external sort spills too often with Parquet data
> --------------------------------------------------------
>
> Key: DRILL-5267
> URL: https://issues.apache.org/jira/browse/DRILL-5267
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> DRILL-5266 describes how Parquet produces low-density record batches. The
> result of these batches is that the external sort spills more frequently than
> it should because it sizes spill files based on batch size, not data content
> of the batch. Since Parquet batches are 95% empty space, the spill files end
> up far too small.
> Adjust the spill calculations based on actual data content, not the size of
> the overall record batch.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)