[ 
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218286#comment-14218286
 ] 

Ryan Blue commented on HIVE-8909:
---------------------------------

Yes. It implements the rules for reading lists in existing data:

1. If the repeated field is not a group, then its type is the element type and 
elements are required.
2. If the repeated field is a group with multiple fields, then its type is the 
element type and elements are required.
3. If the repeated field is a group with one field and is named either "array" 
or uses the LIST-annotated group's name with "_tuple" appended then the 
repeated type is the element type and elements are required.
4. Otherwise, the repeated field's type is the element type with the repeated 
field's repetition.

It also structures the converters to match the other projects. LIST and MAP 
will use ElementConverter and KeyValueConverter and the list version supports 
these rules while matching the ArrayWritable structure expected by the SerDe 
(confirmed by tests that pass in both trunk and this patch).

Repeated groups that aren't annotated are deserialized into lists as before, 
but I changed this to put less work on the DataWritableGroupConverter that is 
now called StructConverter. Struct needs to support repeated inner groups, but 
rather than keeping a second array of objects, it passes its start() and end() 
calls to the repeated children converters, which use them to add the correct 
object to the struct. It's an easier-to-follow method that produces the same 
result. (By all means, please verify this!)

> Hive doesn't correctly read Parquet nested types
> ------------------------------------------------
>
>                 Key: HIVE-8909
>                 URL: https://issues.apache.org/jira/browse/HIVE-8909
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>         Attachments: HIVE-8909-1.patch
>
>
> Parquet's Avro and Thrift object models don't produce the same parquet type 
> representation for lists and maps that Hive does. In the Parquet community, 
> we've defined what should be written and backward-compatibility rules for 
> existing data written by parquet-avro and parquet-thrift in PARQUET-113. We 
> need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to