[
https://issues.apache.org/jira/browse/ARROW-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596865#comment-16596865
]
Wes McKinney commented on ARROW-3144:
-------------------------------------
The context for this issue is the design of an Arrow-native RPC system. A "get
info" request may return the schema without the dictionaries (which could be
large), and the dictionaries would be sent later when the dataset is actually
requested. Without some improved solution at the C++ API level, we would be
unable to deserialize the schema IPC message without the corresponding
dictionary batches
> [C++] Better solution for cases where dictionaries are unknown at schema
> reconstruction time, or for delta dictionaries
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-3144
> URL: https://issues.apache.org/jira/browse/ARROW-3144
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.12.0
>
>
> There are a couple of inter-related issues:
> * Cases where a system might send the schema without the dictionaries, and
> the user wishes to reason about the schema and its types without knowing the
> dictionary values
> * Dictionaries that are changing, e.g. using delta dictionary messages
> {{arrow::DictionaryType}} has no "linkage" to any external object. I propose
> adding a "LinkedDictionaryType" or something similar (purely a C++
> construct), which functionally would be a subclass of {{DictionaryType}},
> which would allow a type to be created which will obtain its dictionary later
> through some kind of "Dictionary provider" interface. There is something
> similar in Java already. This would allow a dictionary to evolve via delta
> dictionaries, or for a dictionary to be retrieved later e.g. through an RPC
> or IPC layer
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)