[
https://issues.apache.org/jira/browse/ARROW-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045207#comment-16045207
]
Bryan Cutler commented on ARROW-692:
------------------------------------
Updated the sample JSON - the Field schema should be the dictionary type (utf8
here), and not the index type
[~wesmckinn] and [~julienledem] I have a couple questions:
1) The "name" field in the dictionary is meaningless right? It's not part of
the RecordBatch message. In Arrow Java, when writing it will be whatever name
the user initializes the dictionary vector as. When reading, the dictionary
vector will be the first Field name that has a dictionary encoding. Would it
be better to overwrite any name to something standard like "DICT#" where # is
the dictionary id?
2) Does it make sense for the dictionary field to be nullable? In Java the
dictionary field nullable flag will be whatever the first field using that
encoding is. Should nullable only be allowed to be false and enforce this when
setting the dictionary field? Of course the encoded index field can be nullable.
> Java<->C++ Integration tests for dictionary-encoded vectors
> -----------------------------------------------------------
>
> Key: ARROW-692
> URL: https://issues.apache.org/jira/browse/ARROW-692
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Java - Vectors
> Reporter: Wes McKinney
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)