[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

paul-rogers Wed, 21 Feb 2018 22:00:07 -0800

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169860846
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 ---
    @@ -266,12 +270,18 @@ public void addComplexWriter(ComplexWriter writer) {
         complexWriters.add(writer);
       }
     
    -  private boolean doAlloc() {
    -    //Allocate vv in the allocationVectors.
    +  private boolean doAlloc(int recordCount) {
    +
    +    //Allocate v in the allocationVectors.
         for (ValueVector v : this.allocationVectors) {
    -      if (!v.allocateNewSafe()) {
    -        return false;
    -      }
    +      // build vector initializer for the column.
    +      // This will iteratively include all nested columns underneath.
    +      RecordBatchSizer.ColumnSize colSize = 
flattenMemoryManager.getColumnSize(v.getField().getName());
    +      VectorInitializer initializer = new VectorInitializer();
    +      colSize.buildVectorInitializer(initializer);
    +      // Allocate memory for the vector. If it is map, it will allocate 
memory
    +      // for all nested child columns as well.
    +      initializer.allocateVector(v, "", recordCount);
    --- End diff --
    
    While this code can be made to work, it does introduce a more complex path 
than was intended. The idea is that a `VectorInirializer` is a recursive 
structure. The top-level one holds hints for the row. Nested instances can hold 
data for maps.
    
    Because this class was meant to be temporary, it holds "hints": the 
information is used if available, defaults used if the hints are not available.
    
    So, a better approach would be to assemble a `VectorInitializer` for the 
output row in one step. Then, apply it to the entire row in another step.
    
    Further if we do that, we don't have to create initializer objects for 
every vector allocation; we can reuse the same set if the output row sizes 
don't change.
    
    Further, the code will be easier to reason about since we won't have two 
distinct paths.

---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Reply via email to