[jira] [Commented] (ARROW-542) [Java] Implement dictionaries in stream/file encoding

Emilio Lahr-Vivaz (JIRA) Thu, 09 Feb 2017 09:34:22 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859863#comment-15859863
 ]


Emilio Lahr-Vivaz commented on ARROW-542:
-----------------------------------------

It's getting a little complicated trying to encode/decode the dictionaries, 
given the interplay between the reader/writers, the vector loader/unloader and 
the ArrowRecordBatch. Right now I'm trying to rely on finding DictionaryVector 
class instances, but that breaks down when things start getting encoded. The 
two step process between the vector loaders/unloaders and the file 
readers/writers makes it hard to track state. The ArrowRecordBatch which is 
passed around doesn't even include any Field data. It seems like it would be 
more straightforward to require the user to set the dictionary ids up front in 
the Field. The dictionary ID is defined as a Long, which seems to imply that 
they were not meant to be entirely transient (otherwise it could be an Int or 
smaller). [~wesmckinn] thoughts? I realize this goes against what you've been 
saying.

> [Java] Implement dictionaries in stream/file encoding
> -----------------------------------------------------
>
>                 Key: ARROW-542
>                 URL: https://issues.apache.org/jira/browse/ARROW-542
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java - Vectors
>            Reporter: Emilio Lahr-Vivaz
>            Assignee: Emilio Lahr-Vivaz
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-542) [Java] Implement dictionaries in stream/file encoding

Reply via email to