[
https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487136#comment-17487136
]
Sarah Gilmore commented on ARROW-15554:
---------------------------------------
Hi [~apitrou],
I was more thinking about the future when I created this Jira issue. I don't
have a concrete need now, but I can picture a few scenarios in which the size
limitation imposed by MapArray's 32-bit offsets cannot be worked around.
*Scenario 1:*
Suppose you have a ListArray of MapArrays. If one of the maps requires more
than int32::max key-value pairs, there's no way to do this currently. You could
try using a ChunkedArray, but you would still need to split the large map
across multiple rows in the list.
*Scenario 2:*
Even if the MapArray is at the top of the object hierarchy, the same problem
could potentially arise if a row within the array needs to contain more than
int32::max key-value pairs. You could try to use a ChunkedArray to resolve the
issue, but the key-value pairs would still be split across multiple rows.
I've seen Parquet files with MAP columns, and I can imagine a situation in
which someone has a very large MAP as the top-most data structure or within a
nested one. While running into a situation in which they can't use MapArrays to
represent their data is probably rare, it's not entirely impossible given
int32's size restrictions.
I'd honestly be interested in looking into this myself.
I hope this helps.
Best,
Sarah
> [Format][C++] Add "LargeMap" type with 64-bit offsets
> -----------------------------------------------------
>
> Key: ARROW-15554
> URL: https://issues.apache.org/jira/browse/ARROW-15554
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Format
> Reporter: Sarah Gilmore
> Priority: Major
>
> It would be nice if a "LargeMap" type existed along side the "Map" type for
> parity. For other datatypes that require offset arrays/buffers, such as
> String, List, BinaryArray, provides a "large" version of these types, i.e.
> LargeString, LargeList, and LargeBinaryArray. It would be nice to have a
> "LargeMap" for parity.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)