On Tue, Apr 21, 2020 at 6:34 AM Yue Ni <niyue....@gmail.com> wrote:
>
> Hi there,
>
> I am currently using gandiva C++ library doing projection/selection for
> Arrow record batch, in my record batch, I have some fields encoded with
> dictionary encoding, I wonder how I can apply gandiva functions for these
> dictionary encoded fields.
>
> Currently, there is no gandiva function having signature supporting
> dictionary array, and if I tried using the dictionary array's value type to
> compose a gandiva function expression and create a projector, it will
> report "Field definition in schema my_field dictionary<values=string,
> indices=int8, ordered=0> different from field in expression
> my_field:string", which is expected.
>
> I would like to know how to solve this problem in arrow/gandiva, more
> specifically:
> 1) Do I need to convert a dictionary array into a non dictionary
> encoded array for applying such a projection?

Currently yes

> 2) Is there any API in Arrow that allows me to convert a dictionary array
> into a non dictionary encoded array easily?

Yes, use arrow::compute::Cast with the dense type as the target type

> 3) Initially I thought Dictionary Array could be accessed with similar API
> like other arrays since dictionary encoding seems to me a mechanism for
> organizing the data internally in the array, and I expect I can access the
> value in the dictionary array like other normal arrays for example,
> dict_array->Value(i), but it turns out users need to use a different API to
> access the values in dictionary (get the indices/dictionaries and then
> retrieve the value). Because of this API difference, other clients for the
> arrow API have to handle dictionary array/normal array differently, is
> there any approach/plan to make this transparent to the API clients?

There's no plan that I'm aware of, but you are welcome to propose one.

> Thanks.

Reply via email to