[
https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860292#comment-15860292
]
Emilio Lahr-Vivaz commented on ARROW-542:
-----------------------------------------
Another blocker I'm hitting is that I don't see any way that the type of a
dictionary block can be determined during read. DictionaryEncoding has an
indexType, but that seems to refer to the ints used to reference the dictionary
values:
https://github.com/apache/arrow/blob/b99d049c3d1894908b7e52774eb657675dc1f439/format/Message.fbs#L165
A dictionary encoded vector currently has it's type defined as the dictionary
index type, but the type of the dictionary is not defined. It works when the
data is in memory with the dictionary alongside it, but not when encoding to
the file format... Possibly the dictionary encoded vector should specify the
dictionary type? It seems like either that or the message format needs another
field for the dictionary type.
> [Java] Implement dictionaries in stream/file encoding
> -----------------------------------------------------
>
> Key: ARROW-542
> URL: https://issues.apache.org/jira/browse/ARROW-542
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java - Vectors
> Reporter: Emilio Lahr-Vivaz
> Assignee: Emilio Lahr-Vivaz
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)