[
https://issues.apache.org/jira/browse/DRILL-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403220#comment-16403220
]
ASF GitHub Bot commented on DRILL-6231:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1171#discussion_r175244575
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java
---
@@ -395,11 +395,24 @@ private void allocateMap(AbstractMapVector map, int
recordCount) {
}
}
+ private void allocateRepeatedList(RepeatedListVector vector, int
recordCount) {
+ vector.allocateOffsetsNew(recordCount);
+ recordCount *= getCardinality();
+ ColumnSize child = children.get(vector.getField().getName());
+ child.allocateVector(vector.getDataVector(), recordCount);
--- End diff --
One interesting feature of this vector is that the child can be null during
reading for some time. That is, in JSON, we may see that the field is `foo:
[[]]`, but don't know the inner type yet. So, for safety, allocate the inner
vector only if `vector.getDataVector()` is non-null.
Also note that a repeated list can be of any dimension. So, the inner
vector can be another repeated list of lesser dimension. The code here handles
that case. But, does the sizer itself handle nested repeated lists? Do we have
a unit test for a 2D and 3D list?
Never had to do these before because only JSON can produce such structures
and we don't seem to exercise most operators with complex JSON structures. We
probably should.
> Fix memory allocation for repeated list vector
> ----------------------------------------------
>
> Key: DRILL-6231
> URL: https://issues.apache.org/jira/browse/DRILL-6231
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Affects Versions: 1.13.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Critical
> Fix For: 1.14.0
>
>
> Vector allocation in record batch sizer can be enhanced to allocate memory
> for repeated list vector more accurately rather than using default functions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)