[jira] [Commented] (ARROW-4083) [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense Array (of the dictionary type)

Uwe L. Korn (JIRA) Fri, 04 Jan 2019 19:24:26 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734757#comment-16734757
 ]


Uwe L. Korn commented on ARROW-4083:
------------------------------------

I think this will be something that really confuses users and leads to 
problems. I would rather stick to DictionaryType being a real type. For the 
described use case, I would rather expect the user to emit a set of 
RecordBatches in their API. These batches will not have to be all of the same 
schema but the schema should be evolable to each other. By this path, we keep 
the ChunkedArray as simple as it currently is but use the RecordBatch as the 
base to pass data around that doesn't have exactly the same schemas.

 

> [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense 
> Array (of the dictionary type)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-4083
>                 URL: https://issues.apache.org/jira/browse/ARROW-4083
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.13.0
>
>
> In some applications we may receive a stream of some dictionary encoded data 
> followed by some non-dictionary encoded data. For example this happens in 
> Parquet files when the dictionary reaches a certain configurable size 
> threshold.
> We should think about how we can model this in our in-memory data structures, 
> and how it can flow through to relevant computational components (i.e. 
> certain data flow observers -- like an Aggregation -- might need to be able 
> to process either a dense or dictionary encoded version of a particular array 
> in the same stream)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4083) [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense Array (of the dictionary type)

Reply via email to