Le 25/08/2021 à 20:02, roee shlomo a écrit :
In Java, the dictionary vector is completely separate from the encoded
vector. Typically, a DictionaryProvider is available alongside a dictionary
encoded vector (to provide dictionaries for the vector and its children).
On the other hand, the C Data Interface bundles the dictionary into the
array.

This means that an API to import an ArrowSchema (in C) into a Field/Schema
(in Java) is not suitable for dictionary encoded arrays because there is an
information loss. Specifically, there is nothing in Field/Schema to
indicate the value type as far as we can tell.

I'll let people acquainted with the Java implementation answer here.

(but I'm a bit surprised that the dictionary value type is not part of the Schema definition, if I'm reading you correctly; in the IPC format, a Field encodes both the value type - in `Field.type` - and the dictionary index type - in `Field.dictionary.indexType`)

We would like to submit a PR without dictionary support first and mark the
API as experimental. We would like to address dictionary support
separately, with the help of the community. Is that acceptable?

To me, definitely.

Regards

Antoine.

Reply via email to