[
https://issues.apache.org/jira/browse/ARROW-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719127#comment-16719127
]
Wes McKinney commented on ARROW-2532:
-------------------------------------
Not sure if this comment is prompted by my working on
https://issues.apache.org/jira/browse/ARROW-3762 but I'm running into the need
for this also.
For the time being, I have implemented very simple ChunkedBinaryBuilder,
ChunkedStringBuilder for this:
https://github.com/wesm/arrow/blob/ARROW-3762/cpp/src/arrow/columnar/builder_binary.h#L239
(I took the liberty of reorganizing the builder code a little bit). I think
these count as Option 4 as you've listed them. I have made them internal for
now so that we have the option to do something more involved as you are
describing.
My plan this week is to do the bare minimum of work (which is already quite a
lot) for now to resolve ARROW-3762 and ARROW-2970.
What do you think we need beyond chunked builders for binary and string? There
is also the issue of chunked fields in a struct, where the other fields have to
be split up after the fact.
> [C++] Add chunked builder classes
> ---------------------------------
>
> Key: ARROW-2532
> URL: https://issues.apache.org/jira/browse/ARROW-2532
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 0.9.0
> Reporter: Antoine Pitrou
> Priority: Major
>
> I think it would be useful to have chunked builders for list, string and
> binary types. A chunked builder would produce a chunked array as output,
> circumventing the 32-bit offset limit of those types. There's some
> special-casing scatterred around our Numpy conversion routines right now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)