[ 
https://issues.apache.org/jira/browse/ARROW-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954254#comment-15954254
 ] 

Wes McKinney commented on ARROW-602:
------------------------------------

hi [~JohanMabille], thank you very much for writing this spec document. 

I think using an STL-compatible interface to Arrow data structures would be 
really useful. As far as the data structures defined in 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h, my feeling 
is that they should remain as "plain old data" with as few features as 
necessary beyond access to their metadata and data buffers -- there are a 
couple of convenience methods on {{arrow::Array}} and its subclasses for 
equality, slicing, and simple value access, but beyond that I am not sure we 
should add very much to these classes (I'd be more in favor of making 
{{array.h}} smaller than making it bigger}}. 

What I'm envisioning is something like:

{code}
std::shared_ptr<Array> my_data = ...;

arrow::ArrayAccessor<Int64Type> container(*my_data);
{code}

>From here, {{container}} would unbox the memory in {{my_data}} and implement 
>the interfaces which you've described in your document. We'll have to make 
>decisions about the return value for {{operator[]}}, like perhaps it will 
>return {{std::optional<int64_t>}} for this example, but the return type for 
>nested types may be more complicated. 

While Arrow memory is intended to be immutable for most applications, if the 
buffers in an array are mutable (e.g. {{my_data->data()->is_mutable()}} is 
true) then this container could permit mutation, subject to const-ness. 

Does this make sense? 

> C++: Provide iterator access to primitive elements inside a 
> Column/ChunkedArray
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-602
>                 URL: https://issues.apache.org/jira/browse/ARROW-602
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Uwe L. Korn
>              Labels: beginner, newbie
>
> Given a ChunkedArray, an Arrow user must currently iterate over all its 
> chunks and then cast them to their types to extract the primitive memory 
> regions to access the values. A convenient way to access the underlying 
> values would be to offer a function that takes a ChunkedArray and returns a 
> C++ iterator over all elements.
> While this may not be the most performant way to access the underlying data, 
> it should have sufficient performance and adds a convenience layer for new 
> users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to