[
https://issues.apache.org/jira/browse/DRILL-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5019.
--------------------------------
Resolution: Fixed
> ExternalSortBatch spills all batches to disk even if even one spills
> --------------------------------------------------------------------
>
> Key: DRILL-5019
> URL: https://issues.apache.org/jira/browse/DRILL-5019
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
>
> The ExternalSortBatch (ESB) operator sorts batches while spilling to disk to
> stay within a defined memory budget.
> Assume the memory budget is 10 GB. Assume that the actual volume of data to
> be sorted is 10.1 GB. The ESB spills the extra 0.1 GB to disk. (Actually
> spills more than that, say 5 GB.)
> At the completion of the run, ESB has read all incoming batches. It must now
> merge those batches. It does so by spilling **all** batches to disk, then
> doing a disk-based merge.
> This means that exceeding the memory limit by even a small amount is the same
> as having a very low memory limit: all batches must spill.
> This solution is simple, it works, and has some amount of logic.
> But, it would be better to have a slightly more advanced solution that spills
> only the smallest possible set of batches to disk, then does a hybrid
> in-memory, on-disk merge, saving the unnecessary write/read cycle.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)