[
https://issues.apache.org/jira/browse/ARROW-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046946#comment-16046946
]
Wes McKinney commented on ARROW-692:
------------------------------------
So the way that schemas are being reconstructed in Java, at least last time I
checked, doesn't seem quite right to me (the schema being modified post-facto).
In C++, the dictionaries are held in the schema, and the schema is immutable --
it proceeds in a first pass to find the dictionary types, then reads the
dictionaries and stores them in a "DictionaryMemo" object, and then does a
second pass of the schema metadata to reconstruct the schema. We were
discussing this in [~elahrvivaz]'s original patches. Seeing if [~julienledem]
has any thoughts on this
One of the artifacts of the way that Java is set up right now is that
dictionary-encoding in nested subfields is not supported, but we need to
support that (e.g. List of Dictionary-encoded String). That's probably a more
invasive refactoring
I will change my patch https://github.com/apache/arrow/pull/750 to put the
dictionaries at the top level of the JSON object; that's not such a big deal.
> Java<->C++ Integration tests for dictionary-encoded vectors
> -----------------------------------------------------------
>
> Key: ARROW-692
> URL: https://issues.apache.org/jira/browse/ARROW-692
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Java - Vectors
> Reporter: Wes McKinney
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)