[
https://issues.apache.org/jira/browse/DRILL-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5267.
--------------------------------
Resolution: Fixed
Fix Version/s: (was: 1.10)
1.10.0
> Managed external sort spills too often with Parquet data
> --------------------------------------------------------
>
> Key: DRILL-5267
> URL: https://issues.apache.org/jira/browse/DRILL-5267
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.10
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> DRILL-5266 describes how Parquet produces low-density record batches. The
> result of these batches is that the external sort spills more frequently than
> it should because it sizes spill files based on batch size, not data content
> of the batch. Since Parquet batches are 95% empty space, the spill files end
> up far too small.
> Adjust the spill calculations based on actual data content, not the size of
> the overall record batch.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)