[
https://issues.apache.org/jira/browse/ARROW-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ji Liu closed ARROW-6308.
-------------------------
Resolution: Invalid
> [Java] Support write interleaved dictionaries and batches in IPC stream
> -----------------------------------------------------------------------
>
> Key: ARROW-6308
> URL: https://issues.apache.org/jira/browse/ARROW-6308
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java
> Reporter: Ji Liu
> Assignee: Ji Liu
> Priority: Major
>
> Per discussions in the following threads, as
> spec([http://arrow.apache.org/docs/format/IPC.html#streaming-format])
> described, as long as a record batch doesn't reference a dictionary they can
> be interleaved.
> [https://github.com/apache/arrow/pull/4960]
> [https://github.com/apache/arrow/pull/5146]
> Currently it’s able to parse dictionaries and batches which are interleaved
> via ARROW-6040, But it’s impossible to write data in this format.
> cases below should be supported:
> i. have a record batch of one dictionary encoded column S
> # Schema
> # RecordBatch: S=[null, null, null, null]
> # DictionaryBatch: ['abc', 'efg']
> # Recordbatch: S=[0, 1, 0, 1]
> ii. have a record batch of two dictionary encoded column S1, S2
> # Schema
> # DictionaryBatch S1: ['ab', 'cd']
> # RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,]
> # DictionaryBatch S2: ['cc', 'dd']
> # RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1]
> This issue is used to record this problem, and should be done after a ML
> discuss.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)