[
https://issues.apache.org/jira/browse/PARQUET-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302953#comment-14302953
]
Philippe Girolami commented on PARQUET-113:
-------------------------------------------
Appreciate the quick reply @rdblue ! It's quite clear.
1) Based on my testing and understanding, working with Lists using these 4
technologies is not currently possible (Protobuf 2.5/Parquet 1.6.0rc3/Hive
0.14/Spark 2.2),
* SparkSQL doesn't understand 3-level lists unless they are annotated (LIST)
and there is no way to annotate the parquet file built from protobuf
* so I'm using 2-level lists in the proto schema using the appropriate field
naming-scheme (which gives me the additional benefit of allowing me to null the
list)
* but Hive doesn't understand 2-level lists so I can't read the table in HIVE
{code}
//2-level encoding, written to Parquet using Spark. Can be read in Spark 2.2
but not in Hive 0.14
//this is a fake protracted example...
message Wheel {
required bool is_front = 1;
required bool is_left = 2;
optional int32 size = 3;
optional string color = 4;
}
message WheelInner {
repeated Wheel array = 1;
}
message Car {
required uint64 id = 1;
required string make = 2;
optional WheelInner wheels = 3;
}
{code}
I see that you implemented HIVE-8909 : since the specs include handling "old"
2-level data, I'm assuming that once Hive 0.15 is released, I'll be able to
read the data. I'm guessing that's what HiveCollectionConverter.
isElementType() does.
2) I've been able to use Maps by building the protobuf schema as seen below. I
suppose that's why you didn't have to test on special field names for Maps too
: "map", "keyvalues".{code}
//3-level encoding, written to Parquet using Spark. Can be read in Spark 2.2
AND Hive 0.14
message MapEntryOk {
required string key = 1;
optional string value = 2;
}
message MyMapOk {
repeated MapEntryOk map = 1;
}
message MyEntityOk {
optional string name = 1;
optional MyMapOk keyvalues = 2;
}
{code}
3) Regarding Spark, do you know if anyone is working on implementing these ? I
couldn't find a ticket.
> Clarify parquet-format specification for LIST and MAP structures.
> -----------------------------------------------------------------
>
> Key: PARQUET-113
> URL: https://issues.apache.org/jira/browse/PARQUET-113
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format, parquet-mr
> Reporter: Ryan Blue
> Assignee: Ryan Blue
>
> There are incompatibilities in the way that some parquet object models
> translate nested structures annotated by LIST and MAP / MAP_KEY_VALUE. We
> need to define clearly what the structures should look like and how to
> interpret existing structures, including what must be supported to read
> current parquet-avro, parquet-thrift, etc. files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)