[
https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859863#comment-15859863
]
Emilio Lahr-Vivaz commented on ARROW-542:
-----------------------------------------
It's getting a little complicated trying to encode/decode the dictionaries,
given the interplay between the reader/writers, the vector loader/unloader and
the ArrowRecordBatch. Right now I'm trying to rely on finding DictionaryVector
class instances, but that breaks down when things start getting encoded. The
two step process between the vector loaders/unloaders and the file
readers/writers makes it hard to track state. The ArrowRecordBatch which is
passed around doesn't even include any Field data. It seems like it would be
more straightforward to require the user to set the dictionary ids up front in
the Field. The dictionary ID is defined as a Long, which seems to imply that
they were not meant to be entirely transient (otherwise it could be an Int or
smaller). [~wesmckinn] thoughts? I realize this goes against what you've been
saying.
> [Java] Implement dictionaries in stream/file encoding
> -----------------------------------------------------
>
> Key: ARROW-542
> URL: https://issues.apache.org/jira/browse/ARROW-542
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java - Vectors
> Reporter: Emilio Lahr-Vivaz
> Assignee: Emilio Lahr-Vivaz
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)