Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r183264768
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -277,18 +286,29 @@ public boolean isRepeatedList() {
         /**
          * This is the average per entry width, used for vector allocation.
          */
    -    public int getEntryWidth() {
    +    private int getEntryWidthForAlloc() {
           int width = 0;
           if (isVariableWidth) {
    -        width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
    +        width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH;
     
             // Subtract out the bits (is-set) vector width
    -        if (metadata.getDataMode() == DataMode.OPTIONAL) {
    +        if (isOptional) {
               width -= BIT_VECTOR_WIDTH;
             }
    +
    +        if (isRepeated && getValueCount() == 0) {
    +          return (safeDivide(width, STD_REPETITION_FACTOR));
    +        }
           }
     
    -      return (safeDivide(width, cardinality));
    +      return (safeDivide(width, getEntryCardinalityForAlloc()));
    +    }
    +
    +    /**
    +     * This is the average per entry cardinality, used for vector 
allocation.
    +     */
    +    private float getEntryCardinalityForAlloc() {
    +      return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 
1) :getCardinality();
    --- End diff --
    
    I'm a bit curious: under what scenario do we want to allocate vectors given 
no input rows? Can we just wait to see data before we do our allocations? It is 
very hard to make reasonable estimates of future size based on no data...


---

Reply via email to