[ 
https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093513#comment-16093513
 ] 

ASF GitHub Bot commented on DRILL-5601:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/860#discussion_r128144743
  
    --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
    @@ -247,27 +249,26 @@ public void copyEntry(int toIndex, ValueVector from, 
int fromIndex) {
       }
     
       @Override
    -  public int getAllocatedByteCount() {
    -    return offsetVector.getAllocatedByteCount() + 
super.getAllocatedByteCount();
    +  public void getLedgers(Set<BufferLedger> ledgers) {
    +    offsetVector.getLedgers(ledgers);
    +    super.getLedgers(ledgers);
       }
     
       @Override
    -  public int getPayloadByteCount() {
    -    UInt${type.width}Vector.Accessor a = offsetVector.getAccessor();
    -    int count = a.getValueCount();
    -    if (count == 0) {
    +  public int getPayloadByteCount(int valueCount) {
    +    if (valueCount == 0) {
           return 0;
    -    } else {
    -      // If 1 or more values, then the last value is set to
    -      // the offset of the next value, which is the same as
    -      // the length of existing values.
    -      // In addition to the actual data bytes, we must also
    -      // include the "overhead" bytes: the offset vector entries
    -      // that accompany each column value. Thus, total payload
    -      // size is consumed text bytes + consumed offset vector
    -      // bytes.
    -      return a.get(count-1) + offsetVector.getPayloadByteCount();
         }
    +    // If 1 or more values, then the last value is set to
    +    // the offset of the next value, which is the same as
    +    // the length of existing values.
    +    // In addition to the actual data bytes, we must also
    +    // include the "overhead" bytes: the offset vector entries
    +    // that accompany each column value. Thus, total payload
    +    // size is consumed text bytes + consumed offset vector
    +    // bytes.
    +    return offsetVector.getAccessor().get(valueCount) +
    --- End diff --
    
    Remember that offset vectors are 1 larger than the number of values. If we 
have three entries, "a", "bb", and "ccc", then we have four offsets: 0, 2, 4, 
7. The number of bytes used by the values is the last offset, the one in 
position 3, or 7. Then, we need the number of bytes from the offset vector 
itself.


> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
>                 Key: DRILL-5601
>                 URL: https://issues.apache.org/jira/browse/DRILL-5601
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very 
> difficult problem of managing memory within Drill in order for the external 
> sort to stay within a memory budget. In general, the fixes relate to better 
> estimating memory used by the three ways that Drill allocates vector memory 
> (see DRILL-5522) and to predicting the size of vectors that the sort will 
> create, to avoid repeated realloc-copy cycles (see DRILL-5594).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to