[ 
https://issues.apache.org/jira/browse/ARROW-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719127#comment-16719127
 ] 

Wes McKinney commented on ARROW-2532:
-------------------------------------

Not sure if this comment is prompted by my working on 
https://issues.apache.org/jira/browse/ARROW-3762 but I'm running into the need 
for this also.

For the time being, I have implemented very simple ChunkedBinaryBuilder, 
ChunkedStringBuilder for this: 
https://github.com/wesm/arrow/blob/ARROW-3762/cpp/src/arrow/columnar/builder_binary.h#L239
 (I took the liberty of reorganizing the builder code a little bit). I think 
these count as Option 4 as you've listed them. I have made them internal for 
now so that we have the option to do something more involved as you are 
describing. 

My plan this week is to do the bare minimum of work (which is already quite a 
lot) for now to resolve ARROW-3762 and ARROW-2970.

What do you think we need beyond chunked builders for binary and string? There 
is also the issue of chunked fields in a struct, where the other fields have to 
be split up after the fact. 

> [C++] Add chunked builder classes
> ---------------------------------
>
>                 Key: ARROW-2532
>                 URL: https://issues.apache.org/jira/browse/ARROW-2532
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.9.0
>            Reporter: Antoine Pitrou
>            Priority: Major
>
> I think it would be useful to have chunked builders for list, string and 
> binary types. A chunked builder would produce a chunked array as output, 
> circumventing the 32-bit offset limit of those types. There's some 
> special-casing scatterred around our Numpy conversion routines right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to