[
https://issues.apache.org/jira/browse/ARROW-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Farmer reassigned ARROW-11869:
-----------------------------------
Assignee: (was: David Dali Susanibar Arce)
This issue was last updated over 90 days ago, which may be an indication it is
no longer being actively worked. To better reflect the current state, the issue
is being unassigned. Please feel free to re-take assignment of the issue if it
is being actively worked, or if you plan to start that work soon.
> [Java] Support re-emitting dictionaries in ArrowStreamWriter
> ------------------------------------------------------------
>
> Key: ARROW-11869
> URL: https://issues.apache.org/jira/browse/ARROW-11869
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Reporter: Joris Peeters
> Priority: Minor
>
> The ArrowStreamWriter currently takes a DictionaryProvider at construction
> time and emits the used dicts once.
> However, the streaming format allows for the dictionaries to change between
> record batches. It would be useful to support this mechanism. It can be
> worked around in various ways (e.g. manually re-emitting DictionaryBatches
> between calling writeBatch), but this isn't very pleasant.
> We'd somehow have to reconcile this with the abstract ArrowWriter parent and
> the ArrowFileWriter sibling. In the latter, for example, this mechanism is
> not supported.
> An example solution (but perhaps we can do better) might be to add a virtual
> `writeBatch(Provider provider)` method, that is UnsupportedOperationException
> in ArrowFileWriter, and re-emits the used dicts in ArrowStreamWriter.
> In the present context just looking at dictionary replacement, not dictionary
> delta's.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)