[
https://issues.apache.org/jira/browse/ARROW-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726440#comment-16726440
]
Wes McKinney commented on ARROW-2532:
-------------------------------------
Looking at this a little bit. From a "cleanness" point of view I think we
should explore option 2 a bit more thoroughly. If we introduced a "growth
strategy" template argument then this could theoretically be done with good
code reuse.
There are some questions like how to handle the differing output type. In the
chunking case, its a vector of arrays. In the non-chunking case, it's a single
array. So you'd have something like:
{code}
class Int16Builder : public ArrayBuilder, public
internal::NonChunkedBuilder<Int16Type> {
public:
using _Impl = internal::NonChunkedBuilder<Int16Type>;
using _Impl::Append;
using _Impl::AppendValues;
// etc
// Implement the ArrayBuilder public API
Status Finish(std::shared_ptr<Array>* out) override;
};
{code}
I've suffered enough from {{extern template}} visibility issues that I'd like
to avoid going down that rabbit hole if it can be avoided
Obviously this is not urgent work -- I place it in the bucket of future
performance optimization (I would guess that chunking at the 16MB/32MB
granularity will improve the performance of many applications)
> [C++] Add chunked builder classes
> ---------------------------------
>
> Key: ARROW-2532
> URL: https://issues.apache.org/jira/browse/ARROW-2532
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 0.9.0
> Reporter: Antoine Pitrou
> Priority: Major
>
> I think it would be useful to have chunked builders for list, string and
> binary types. A chunked builder would produce a chunked array as output,
> circumventing the 32-bit offset limit of those types. There's some
> special-casing scatterred around our Numpy conversion routines right now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)