[jira] [Commented] (ARROW-2532) [C++] Add chunked builder classes

Wes McKinney (JIRA) Thu, 20 Dec 2018 20:58:36 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726440#comment-16726440
 ]


Wes McKinney commented on ARROW-2532:
-------------------------------------

Looking at this a little bit. From a "cleanness" point of view I think we 
should explore option 2 a bit more thoroughly. If we introduced a "growth 
strategy" template argument then this could theoretically be done with good 
code reuse. 

There are some questions like how to handle the differing output type. In the 
chunking case, its a vector of arrays. In the non-chunking case, it's a single 
array. So you'd have something like:

{code}
class Int16Builder : public ArrayBuilder, public 
internal::NonChunkedBuilder<Int16Type> {
 public:
  using _Impl = internal::NonChunkedBuilder<Int16Type>;

  using _Impl::Append;
  using _Impl::AppendValues;
  // etc

  // Implement the ArrayBuilder public API
  Status Finish(std::shared_ptr<Array>* out) override;
};
{code}

I've suffered enough from {{extern template}} visibility issues that I'd like 
to avoid going down that rabbit hole if it can be avoided

Obviously this is not urgent work -- I place it in the bucket of future 
performance optimization (I would guess that chunking at the 16MB/32MB 
granularity will improve the performance of many applications)

> [C++] Add chunked builder classes
> ---------------------------------
>
>                 Key: ARROW-2532
>                 URL: https://issues.apache.org/jira/browse/ARROW-2532
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.9.0
>            Reporter: Antoine Pitrou
>            Priority: Major
>
> I think it would be useful to have chunked builders for list, string and 
> binary types. A chunked builder would produce a chunked array as output, 
> circumventing the 32-bit offset limit of those types. There's some 
> special-casing scatterred around our Numpy conversion routines right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2532) [C++] Add chunked builder classes

Reply via email to