Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1125#discussion_r169860846
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
---
@@ -266,12 +270,18 @@ public void addComplexWriter(ComplexWriter writer) {
complexWriters.add(writer);
}
- private boolean doAlloc() {
- //Allocate vv in the allocationVectors.
+ private boolean doAlloc(int recordCount) {
+
+ //Allocate v in the allocationVectors.
for (ValueVector v : this.allocationVectors) {
- if (!v.allocateNewSafe()) {
- return false;
- }
+ // build vector initializer for the column.
+ // This will iteratively include all nested columns underneath.
+ RecordBatchSizer.ColumnSize colSize =
flattenMemoryManager.getColumnSize(v.getField().getName());
+ VectorInitializer initializer = new VectorInitializer();
+ colSize.buildVectorInitializer(initializer);
+ // Allocate memory for the vector. If it is map, it will allocate
memory
+ // for all nested child columns as well.
+ initializer.allocateVector(v, "", recordCount);
--- End diff --
While this code can be made to work, it does introduce a more complex path
than was intended. The idea is that a `VectorInirializer` is a recursive
structure. The top-level one holds hints for the row. Nested instances can hold
data for maps.
Because this class was meant to be temporary, it holds "hints": the
information is used if available, defaults used if the hints are not available.
So, a better approach would be to assemble a `VectorInitializer` for the
output row in one step. Then, apply it to the entire row in another step.
Further if we do that, we don't have to create initializer objects for
every vector allocation; we can reuse the same set if the output row sizes
don't change.
Further, the code will be easier to reason about since we won't have two
distinct paths.
---