[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

paul-rogers Mon, 19 Mar 2018 16:47:19 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1175#discussion_r175619764
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) {
     
               // Calculate pure data size.
               if (isVariableWidth) {
    -            UInt4Vector offsetVector = ((RepeatedValueVector) 
v).getOffsetVector();
    -            int innerValueCount = 
offsetVector.getAccessor().get(valueCount);
                 VariableWidthVector dataVector = ((VariableWidthVector) 
((RepeatedValueVector) v).getDataVector());
    -            totalDataSize = 
dataVector.getOffsetVector().getAccessor().get(innerValueCount);
    +            totalDataSize = dataVector.getCurrentSizeInBytes();
    --- End diff --
    
    Good improvement. The original code exposes far too much of the 
implementation.
    
    After all these changes, does the "dataSize" include both the offset vector 
and bytes? It should, else calls will be wrong. There are supposed to be three 
sizes:
    
    * Payload size: actual data bytes.
    * Data size: data + offsets + bits
    * Overall size: full length of all vectors.
    
    Payload size is what the user sees. Data size is how we calculate row width 
(since the rows must contain the overhead bytes). Vector length, here, only 
helps compute density, but is generated elsewhere. The point is, keep all three 
in mind, but keep the code separate. Otherwise, it is *very* easy to get 
confused and have the calculations blow up...

---

[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

Reply via email to