[
https://issues.apache.org/jira/browse/ARROW-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085995#comment-17085995
]
Wes McKinney edited comment on ARROW-7779 at 4/17/20, 7:11 PM:
---------------------------------------------------------------
Sorry, it's not a type. Here would be the representation (simplified) in
Flatbuffers
{code}
Field<
name: "movie_genres",
type: Array
dictionary: id=1
child[0]: Field<
name: "item",
type: String
dictionary: id=0
>
>
{code}
>From an algebraic standpoint, the way that dictionary encoding is implemented
>is as:
{code}
Array <
indices: Array,
dictionary: Array
>
{code}
So if we disallow "dictionary" from itself containing dictionary-encoded data
(which is not difficult to construct in-memory), then we have to do data
sanitization either at time of array construction or upon writing to IPC. Both
of those options are icky but the percentage of users that will be harmed by
them is small
was (Author: wesmckinn):
Sorry, it's not a type. Here would be the representation (simplified) in
Flatbuffers
{code}
Field<
name: "movie_genres",
type: Array<
child[0]: Field<
name: "item",
type: String
dictionary: id=0
>
>
dictionary: id=1
>
{code}
>From an algebraic standpoint, the way that dictionary encoding is implemented
>is as:
{code}
Array <
indices: Array,
dictionary: Array
>
{code}
So if we disallow "dictionary" from itself containing dictionary-encoded data
(which is not difficult to construct in-memory), then we have to do data
sanitization either at time of array construction or upon writing to IPC. Both
of those options are icky but the percentage of users that will be harmed by
them is small
> [Format] Enable integration tests for dictionaries-within-dictionaries
> ----------------------------------------------------------------------
>
> Key: ARROW-7779
> URL: https://issues.apache.org/jira/browse/ARROW-7779
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Format, Integration
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 1.0.0
>
>
> The integration test is implemented but currently disabled for all
> implementations
--
This message was sent by Atlassian Jira
(v8.3.4#803005)