hi Rares,

I think there is much more potential for confusion with binary values
(vs. fixed-width values) because any array may have a non-zero offset
(from being sliced).

If you have

const int32_t* value_offsets = binary_arr.raw_value_offsets();

then this accounts for any non-zero slice offset
(binary_arr.offset()). The problem IMHO with having

const uint8_t* values = binary_arr.raw_values();

is that you have two choices, neither of them good:

* Return a pointer to where the data for that array starts (including
any offset). But then you cannot index into this array with the values
from raw_value_offsets()
* Return a pointer to the memory inside the data buffer (not
accounting for the offset), but then raw_values() has inconsistent
semantics with other raw_values methods

There is already the value_data() method which returns the data
buffer, so if you want the raw data you can do

const uint8_t* raw_data = binary_arr.value_data()->data();

https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L451

this is something you can index into with the result of
raw_value_offsets(). Or you can use the BinaryArray::GetValue method

To make sure I'm getting through the issue clearly, consider a binary
array with 5 values

a bb ccc dddd eeeee

This has buffers:

length: 5
offset: 0
buffer[1] (offset): [0, 1, 3, 6, 10, 15]
buffer[2] (data): aabbcccddddeeeee

Now suppose you slice this array, say

auto sliced = arr->Slice(2);

Now the sliced array has:

length: 3
offset: 2
buffer[1] (offset): [0, 1, 3, 6, 10, 15]
buffer[2] (data): aabbcccddddeeeee

I think because of the offsets and the potential for confusion with
zero-copy array slices that if you want to interact with the raw data
that you go directly to the buffer (value_data()->data()).

- Wes

On Sun, Sep 17, 2017 at 2:40 PM, Rares Vernica <[email protected]> wrote:
> Hi,
>
> I have a question about the Array C++ API. BinaryArray has a
> raw_value_offsets() public member. Should it also have a raw_vaues() public
> member to give a pointer to the start of raw data? Or is this not feasible?
>
> Thanks,
> Rares

Reply via email to