[
https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated DRILL-5023:
-------------------------------
Fix Version/s: (was: 1.10.0)
1.11.0
> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>
> Key: DRILL-5023
> URL: https://issues.apache.org/jira/browse/DRILL-5023
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Fix For: 1.11.0
>
>
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as
> needed to operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when
> doing so, the ESB carves off the first spillable batch and holds it in memory:
> {code}
> // 1 output container is kept in memory, so we want to hold on to it and
> transferClone
> // allows keeping ownership
> VectorContainer c1 = VectorContainer.getTransferClone(outputContainer,
> oContext);
> c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
> c1.setRecordCount(count);
> ...
> BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that
> nothing is spilled as the first spillable batch is simply stored back into
> memory on the (supposedly) spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If
> the first batch is held back to work around some issue (to keep a schema,
> say?), then fine a different solution that allows the actual data to spill.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)