[jira] [Commented] (PARQUET-113) Clarify parquet-format specification for LIST and MAP structures.

Philippe Girolami (JIRA) Tue, 03 Feb 2015 00:37:06 -0800

    [ 
https://issues.apache.org/jira/browse/PARQUET-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302953#comment-14302953
 ]


Philippe Girolami commented on PARQUET-113:
-------------------------------------------

Appreciate the quick reply @rdblue ! It's quite clear.

1) Based on my testing and understanding, working with Lists using these 4 
technologies is not currently possible (Protobuf 2.5/Parquet 1.6.0rc3/Hive 
0.14/Spark 2.2), 
* SparkSQL doesn't understand 3-level lists unless they are annotated (LIST) 
and there is no way to annotate the parquet file built from protobuf 
* so I'm using 2-level lists in the proto schema using the appropriate field 
naming-scheme (which gives me the additional benefit of allowing me to null the 
list)
* but Hive doesn't understand 2-level lists so I can't read the table in HIVE
{code}
//2-level encoding, written to Parquet using Spark. Can be read in Spark 2.2 
but not in Hive 0.14
//this is a fake protracted example...
message Wheel {
        required        bool    is_front = 1;
        required        bool is_left    = 2;
        optional        int32   size = 3;
        optional        string  color = 4;
}
message WheelInner {
        repeated Wheel array = 1;
}
message Car {
        required uint64 id = 1;
        required string  make = 2;
        optional WheelInner wheels = 3;
}
{code}

I see that you implemented HIVE-8909  : since the specs include handling "old" 
2-level data, I'm assuming that once Hive 0.15 is released, I'll be able to 
read the data. I'm guessing that's what HiveCollectionConverter. 
isElementType() does. 

2) I've been able to use Maps by building the protobuf schema as seen below. I 
suppose that's why you didn't have to test on special field names for Maps too 
: "map", "keyvalues".{code}
//3-level encoding, written to Parquet using Spark. Can be read in Spark 2.2 
AND Hive 0.14
message MapEntryOk {
        required string key = 1;
        optional string value = 2;
}
message MyMapOk {
        repeated MapEntryOk     map = 1;
}
message MyEntityOk {
        optional string name = 1;
        optional MyMapOk keyvalues = 2;
}
{code}

3) Regarding Spark, do you know if anyone is working on implementing these ? I 
couldn't find a ticket.

> Clarify parquet-format specification for LIST and MAP structures.
> -----------------------------------------------------------------
>
>                 Key: PARQUET-113
>                 URL: https://issues.apache.org/jira/browse/PARQUET-113
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format, parquet-mr
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>
> There are incompatibilities in the way that some parquet object models 
> translate nested structures annotated by LIST and MAP / MAP_KEY_VALUE. We 
> need to define clearly what the structures should look like and how to 
> interpret existing structures, including what must be supported to read 
> current parquet-avro, parquet-thrift, etc. files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-113) Clarify parquet-format specification for LIST and MAP structures.

Reply via email to