[
https://issues.apache.org/jira/browse/ARROW-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-1943:
------------------------------
External issue URL: https://github.com/apache/arrow/issues/17933
> Handle setInitialCapacity() for deeply nested lists of lists
> ------------------------------------------------------------
>
> Key: ARROW-1943
> URL: https://issues.apache.org/jira/browse/ARROW-1943
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java
> Reporter: Siddharth Teotia
> Assignee: Siddharth Teotia
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The current implementation of setInitialCapacity() uses a factor of 5 for
> every level we go into list:
> So if the schema is LIST (LIST (LIST (LIST (LIST (LIST (LIST (BIGINT))))))
> and we start with an initial capacity of 128, we end up throwing
> OversizedAllocationException from the BigIntVector because at every level we
> increased the capacity by 5 and by the time we reached inner scalar that
> actually stores the data, we were well over max size limit per vector (1MB).
> We saw this problem in Dremio when we failed to read deeply nested JSON data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)