Sergio Peña created HIVE-9502:
---------------------------------
Summary: Parquet cannot read Map types from files written with
Hive <= 0.12
Key: HIVE-9502
URL: https://issues.apache.org/jira/browse/HIVE-9502
Project: Hive
Issue Type: Bug
Reporter: Sergio Peña
Assignee: Sergio Peña
When reading a Parquet file written by Hive <= 0.12, the following error is
thrown:
{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.getMap(AbstractParquetMapInspector.java:73)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:519)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:443)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
... 9 more
{noformat}
This is because old versions of Hive (<= 0.12) write Map types using the
following schema:
{noformat}
optional group m1 (MAP_KEY_VALUE) {
repeated group map {
required binary key;
optional binary key;
}
}
{noformat}
PARQUET-113 mentions new annotations for Parquet nested types.
https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md#maps
And now the correct schema is:
{noformat}
optional group m1f (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key;
optional binary key;
}
}
{noformat}
We should be backwards compatible to the old schema as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)