[ 
https://issues.apache.org/jira/browse/ARROW-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046946#comment-16046946
 ] 

Wes McKinney commented on ARROW-692:
------------------------------------

So the way that schemas are being reconstructed in Java, at least last time I 
checked, doesn't seem quite right to me (the schema being modified post-facto). 
In C++, the dictionaries are held in the schema, and the schema is immutable -- 
it proceeds in a first pass to find the dictionary types, then reads the 
dictionaries and stores them in a "DictionaryMemo" object, and then does a 
second pass of the schema metadata to reconstruct the schema. We were 
discussing this in [~elahrvivaz]'s original patches. Seeing if [~julienledem] 
has any thoughts on this

One of the artifacts of the way that Java is set up right now is that 
dictionary-encoding in nested subfields is not supported, but we need to 
support that (e.g. List of Dictionary-encoded String). That's probably a more 
invasive refactoring 

I will change my patch https://github.com/apache/arrow/pull/750 to put the 
dictionaries at the top level of the JSON object; that's not such a big deal.

> Java<->C++ Integration tests for dictionary-encoded vectors
> -----------------------------------------------------------
>
>                 Key: ARROW-692
>                 URL: https://issues.apache.org/jira/browse/ARROW-692
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Java - Vectors
>            Reporter: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to