Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169860846 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java --- @@ -266,12 +270,18 @@ public void addComplexWriter(ComplexWriter writer) { complexWriters.add(writer); } - private boolean doAlloc() { - //Allocate vv in the allocationVectors. + private boolean doAlloc(int recordCount) { + + //Allocate v in the allocationVectors. for (ValueVector v : this.allocationVectors) { - if (!v.allocateNewSafe()) { - return false; - } + // build vector initializer for the column. + // This will iteratively include all nested columns underneath. + RecordBatchSizer.ColumnSize colSize = flattenMemoryManager.getColumnSize(v.getField().getName()); + VectorInitializer initializer = new VectorInitializer(); + colSize.buildVectorInitializer(initializer); + // Allocate memory for the vector. If it is map, it will allocate memory + // for all nested child columns as well. + initializer.allocateVector(v, "", recordCount); --- End diff -- While this code can be made to work, it does introduce a more complex path than was intended. The idea is that a `VectorInirializer` is a recursive structure. The top-level one holds hints for the row. Nested instances can hold data for maps. Because this class was meant to be temporary, it holds "hints": the information is used if available, defaults used if the hints are not available. So, a better approach would be to assemble a `VectorInitializer` for the output row in one step. Then, apply it to the entire row in another step. Further if we do that, we don't have to create initializer objects for every vector allocation; we can reuse the same set if the output row sizes don't change. Further, the code will be easier to reason about since we won't have two distinct paths.
---