[
https://issues.apache.org/jira/browse/DRILL-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5011.
--------------------------------
Resolution: Fixed
> External Sort Batch memory use depends on record width
> ------------------------------------------------------
>
> Key: DRILL-5011
> URL: https://issues.apache.org/jira/browse/DRILL-5011
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
>
> The ExternalSortBatch operator uses spill-to-disk to keep memory needs within
> a defined limit. However, the "copier" (really, the merge operation) can use
> an amount of memory determined not by the operator configuration, but by the
> width of each record.
> The copier memory limit appears to be set by the COPIER_BATCH_MEM_LIMIT value.
> However, the actual memory use is determined by the number of records that
> the copier is asked to copy. That record comes from an estimate of row width
> based on the type of each column. Note that the row width *is not* based on
> the actual data in each row. Varchar fields, for example, are assumed to be
> 40 characters wide. If the sorter is asked to sort records with Varchar
> fields of, say, 1000 characters, then the row width estimate will be a poor
> estimator of actual width.
> Memory use is based on a
> {code}
> target record count = memory limit / estimate row width
> {code}
> Actual memory use is:
> {code}
> memory use = target row count * actual row width
> {code}
> Which is
> {code}
> memory use = memory limit * actual row width / estimate row width
> {code}
> That is, memory use depends on the ratio of actual to estimated width. If the
> estimate is off by 2, then we use twice as much memory as expected.
> Not that the memory used for the copier defaults to 20 MB, so even an error
> of 4x still means only 80 MB of memory used; small in comparison to the many
> GB typically allocated to ESB storage.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)