[
https://issues.apache.org/jira/browse/ARROW-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954254#comment-15954254
]
Wes McKinney commented on ARROW-602:
------------------------------------
hi [~JohanMabille], thank you very much for writing this spec document.
I think using an STL-compatible interface to Arrow data structures would be
really useful. As far as the data structures defined in
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h, my feeling
is that they should remain as "plain old data" with as few features as
necessary beyond access to their metadata and data buffers -- there are a
couple of convenience methods on {{arrow::Array}} and its subclasses for
equality, slicing, and simple value access, but beyond that I am not sure we
should add very much to these classes (I'd be more in favor of making
{{array.h}} smaller than making it bigger}}.
What I'm envisioning is something like:
{code}
std::shared_ptr<Array> my_data = ...;
arrow::ArrayAccessor<Int64Type> container(*my_data);
{code}
>From here, {{container}} would unbox the memory in {{my_data}} and implement
>the interfaces which you've described in your document. We'll have to make
>decisions about the return value for {{operator[]}}, like perhaps it will
>return {{std::optional<int64_t>}} for this example, but the return type for
>nested types may be more complicated.
While Arrow memory is intended to be immutable for most applications, if the
buffers in an array are mutable (e.g. {{my_data->data()->is_mutable()}} is
true) then this container could permit mutation, subject to const-ness.
Does this make sense?
> C++: Provide iterator access to primitive elements inside a
> Column/ChunkedArray
> -------------------------------------------------------------------------------
>
> Key: ARROW-602
> URL: https://issues.apache.org/jira/browse/ARROW-602
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Uwe L. Korn
> Labels: beginner, newbie
>
> Given a ChunkedArray, an Arrow user must currently iterate over all its
> chunks and then cast them to their types to extract the primitive memory
> regions to access the values. A convenient way to access the underlying
> values would be to offer a function that takes a ChunkedArray and returns a
> C++ iterator over all elements.
> While this may not be the most performant way to access the underlying data,
> it should have sufficient performance and adds a convenience layer for new
> users.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)