[
https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653067#comment-15653067
]
Paul Rogers commented on DRILL-5027:
------------------------------------
Appears that the above comment is slightly off. Actual behavior is that when
the number of previously-spilled batches hits some threshold, all the
previously spilled batches are merged into a new batch.
This process does limit the amount of memory used, but at the cost or rewriting
the same data over and over. Consider:
{code}
1 2 3 4 (respill)
1234 5 6 7 (respill)
1234567 8 9 10
{code}
That is batches, 1, 2, 3 and 4 are re-spilled to create 1234, and so on.
The same lead data is rewritten over and over. Better would be to re-spill only
newer data:
{code}
1 2 3 4 (re-spill)
1234 | 5 6 7 9 (re-spill)
1234 5678 | 9 10 11
{code}
Where the bar indicates a moving point at which we've already merged and do not
need to do so again.
At some point the process may have to repeat by merging the second-generation
spill files and so on.
> ExternalSortBatch can use excessive memory for very large queries
> -----------------------------------------------------------------
>
> Key: DRILL-5027
> URL: https://issues.apache.org/jira/browse/DRILL-5027
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Priority: Minor
>
> The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as
> needed to operate within a memory budget.
> The sort happens in two phases:
> 1. Gather the incoming batches from the upstream operator, sort them, and
> spill to disk as needed.
> 2. Merge the "runs" spilled in step 1.
> In most cases, the second step should run within the memory available for the
> first step (which is why severity is only Minor). However, if the query is
> exceptionally large, then the second step can use excessive memory.
> Here is why.
> * In step 1, we create a series of n spill files. Each file contains batches
> of some maximum size, say 20 MB.
> * In step 2, we must simultaneously open all the spilled files and read the
> first batch into memory in preparation for merging.
> Suppose that the query has 1 TB of data. Suppose that 10 GB of memory is
> available. The result will be 1 TB / 1 GB = 1000 spill files. Suppose each
> batch in each file is 20 MB. A single-pass merge will need 20 MB * 1000 = 20
> GB to operate. But, we only have 10 GB.
> The typical solution is to perform the merge in multiple phases, with each
> phase reading and respilling only enough runs that will fit in memory. In the
> above case, the first phase would make two iterations, merging 500 GB in each
> using 10 GB of memory. The second phase would make a single iteration to
> merge the two first phase files.
> The result is much disk I/O, and a requirement for sufficient disk spaces to
> store two sets of spill files (one from phase i-1, another for phase i). But,
> the query will complete within the memory budgeted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)