[ 
https://issues.apache.org/jira/browse/ARROW-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085995#comment-17085995
 ] 

Wes McKinney edited comment on ARROW-7779 at 4/17/20, 7:11 PM:
---------------------------------------------------------------

Sorry, it's not a type. Here would be the representation (simplified) in 
Flatbuffers

{code}
Field<
  name: "movie_genres",
  type: Array
  dictionary: id=1
  child[0]: Field<
      name: "item",
      type: String
      dictionary: id=0
  >
>
{code}

>From an algebraic standpoint, the way that dictionary encoding is implemented 
>is as:

{code}
Array <
  indices: Array,
  dictionary: Array
>
{code}

So if we disallow "dictionary" from itself containing dictionary-encoded data 
(which is not difficult to construct in-memory), then we have to do data 
sanitization either at time of array construction or upon writing to IPC. Both 
of those options are icky but the percentage of users that will be harmed by 
them is small


was (Author: wesmckinn):
Sorry, it's not a type. Here would be the representation (simplified) in 
Flatbuffers

{code}
Field<
  name: "movie_genres",
  type: Array<
    child[0]: Field<
      name: "item",
      type: String
      dictionary: id=0
    >
  >
  dictionary: id=1
>
{code}

>From an algebraic standpoint, the way that dictionary encoding is implemented 
>is as:

{code}
Array <
  indices: Array,
  dictionary: Array
>
{code}

So if we disallow "dictionary" from itself containing dictionary-encoded data 
(which is not difficult to construct in-memory), then we have to do data 
sanitization either at time of array construction or upon writing to IPC. Both 
of those options are icky but the percentage of users that will be harmed by 
them is small

> [Format] Enable integration tests for dictionaries-within-dictionaries
> ----------------------------------------------------------------------
>
>                 Key: ARROW-7779
>                 URL: https://issues.apache.org/jira/browse/ARROW-7779
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format, Integration
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 1.0.0
>
>
> The integration test is implemented but currently disabled for all 
> implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to