Siddharth Teotia created ARROW-1943:
---------------------------------------
Summary: Handle setInitialCapacity() for deeply nested lists of
lists
Key: ARROW-1943
URL: https://issues.apache.org/jira/browse/ARROW-1943
Project: Apache Arrow
Issue Type: Bug
Reporter: Siddharth Teotia
Assignee: Siddharth Teotia
The current implementation of setInitialCapacity() uses a factor of 5 for every
level we go into list:
So if the schema is LIST (LIST (LIST (LIST (LIST (LIST (LIST (BIGINT)))))) and
we start with an initial capacity of 128, we end up not throwing
OversizedAllocationException from the BigIntVector because at every level we
increased the capacity by 5 and by the time we reached inner scalar that
actually stores the data, we were well over max size limit per vector (1MB).
We saw this problem in Dremio when we failed to read deeply nested JSON data.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)