[ 
https://issues.apache.org/jira/browse/DRILL-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5011.
--------------------------------
    Resolution: Fixed

> External Sort Batch memory use depends on record width
> ------------------------------------------------------
>
>                 Key: DRILL-5011
>                 URL: https://issues.apache.org/jira/browse/DRILL-5011
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> The ExternalSortBatch operator uses spill-to-disk to keep memory needs within 
> a defined limit. However, the "copier" (really, the merge operation) can use 
> an amount of memory determined not by the operator configuration, but by the 
> width of each record.
> The copier memory limit appears to be set by the COPIER_BATCH_MEM_LIMIT value.
> However, the actual memory use is determined by the number of records that 
> the copier is asked to copy. That record comes from an estimate of row width 
> based on the type of each column. Note that the row width *is not* based on 
> the actual data in each row. Varchar fields, for example, are assumed to be 
> 40 characters wide. If the sorter is asked to sort records with Varchar 
> fields of, say, 1000 characters, then the row width estimate will be a poor 
> estimator of actual width.
> Memory use is based on a
> {code}
> target record count = memory limit / estimate row width
> {code}
> Actual memory use is:
> {code}
> memory use = target row count * actual row width
> {code}
> Which is
> {code}
> memory use = memory limit * actual row width  / estimate row width
> {code}
> That is, memory use depends on the ratio of actual to estimated width. If the 
> estimate is off by 2, then we use twice as much memory as expected.
> Not that the memory used for the copier defaults to 20 MB, so even an error 
> of 4x still means only 80 MB of memory used; small in comparison to the many 
> GB typically allocated to ESB storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to