[GitHub] drill pull request #1228: DRILL-6307: Handle empty batches in record batch s...

paul-rogers Wed, 25 Apr 2018 17:47:23 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r184244479
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -277,18 +286,29 @@ public boolean isRepeatedList() {
         /**
          * This is the average per entry width, used for vector allocation.
          */
    -    public int getEntryWidth() {
    +    private int getEntryWidthForAlloc() {
           int width = 0;
           if (isVariableWidth) {
    -        width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
    +        width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH;
     
             // Subtract out the bits (is-set) vector width
    -        if (metadata.getDataMode() == DataMode.OPTIONAL) {
    +        if (isOptional) {
               width -= BIT_VECTOR_WIDTH;
             }
    +
    +        if (isRepeated && getValueCount() == 0) {
    +          return (safeDivide(width, STD_REPETITION_FACTOR));
    +        }
           }
     
    -      return (safeDivide(width, cardinality));
    +      return (safeDivide(width, getEntryCardinalityForAlloc()));
    +    }
    +
    +    /**
    +     * This is the average per entry cardinality, used for vector 
allocation.
    +     */
    +    private float getEntryCardinalityForAlloc() {
    +      return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 
1) :getCardinality();
    --- End diff --
    
    Makes sense, but why would a batch be empty unless that path hit EOF? 
Otherwise, the batch might be due to an empty input file. We'd just skip it and 
move to the next batch until we find one with data. Any reason the "get next 
batch" code can just loop to be "get next non-empty batch" instead? Otherwise, 
we can't really do any effective batch sizing as we have no data to go on...

---

[GitHub] drill pull request #1228: DRILL-6307: Handle empty batches in record batch s...

Reply via email to