[ 
https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5023:
-------------------------------
    Description: 
The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as 
needed to operate within a defined memory budget.

When needed, ESB spills accumulated record batches to disk. However, when doing 
so, the ESB carves off the first spillable batch and holds it in memory:

{code}
    // 1 output container is kept in memory, so we want to hold on to it and 
transferClone
    // allows keeping ownership
    VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, 
oContext);
    c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
    c1.setRecordCount(count);
...
    BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
{code}

When the spill batch size gets larger (to fix DRILL-5022), the result is that 
nothing is spilled as the first spillable batch is simply stored back into 
memory on the (supposedly) spilled batches list.

The desired behavior is for all spillable batches to be written to disk. If the 
first batch is held back to work around some issue (to keep a schema, say?), 
then fine a different solution that allows the actual data to spill.

  was:
The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as 
needed to operate within a defined memory budget.

When needed, ESB spills accumulated record batches to disk. However, when doing 
so, the ESB carves off the first spillable batch and holds it in memory:

{{code}}
    // 1 output container is kept in memory, so we want to hold on to it and 
transferClone
    // allows keeping ownership
    VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, 
oContext);
    c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
    c1.setRecordCount(count);
...
    BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
}}

When the spill batch size gets larger (to fix DRILL-5022), the result is that 
nothing is spilled as the first spillable batch is simply stored back into 
memory on the (supposedly) spilled batches list.

The desired behavior is for all spillable batches to be written to disk. If the 
first batch is held back to work around some issue (to keep a schema, say?), 
then fine a different solution that allows the actual data to spill.


> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5023
>                 URL: https://issues.apache.org/jira/browse/DRILL-5023
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as 
> needed to operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when 
> doing so, the ESB carves off the first spillable batch and holds it in memory:
> {code}
>     // 1 output container is kept in memory, so we want to hold on to it and 
> transferClone
>     // allows keeping ownership
>     VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, 
> oContext);
>     c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
>     c1.setRecordCount(count);
> ...
>     BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that 
> nothing is spilled as the first spillable batch is simply stored back into 
> memory on the (supposedly) spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If 
> the first batch is held back to work around some issue (to keep a schema, 
> say?), then fine a different solution that allows the actual data to spill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to