Mala Chikka Kempanna created PARQUET-26:
-------------------------------------------
Summary: Parquet doesn't recognize the nested Array type in MAP as
ArrayWritable.
Key: PARQUET-26
URL: https://issues.apache.org/jira/browse/PARQUET-26
Project: Parquet
Issue Type: Bug
Reporter: Mala Chikka Kempanna
Attachments: test.dat
When trying to insert hive data of type of MAP<string, array<int>> into
Parquet, it throws the following error
Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable
or MapWritable:
org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@c644ef1c
at
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:86)
Problem is reproducible with following steps:
Relevant test data is attached.
1.
CREATE TABLE test_hive (
node string,
stime string,
stimeutc string,
swver string,
moid MAP <string,string>,
pdfs MAP <string,array<int>>,
utcdate string,
motype string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY '=';
2.
LOAD DATA LOCAL INPATH '/root/38388/test.dat' INTO TABLE test_hive;
3.
CREATE TABLE test_parquet(
pdfs MAP <string,array<int>>
)
STORED AS PARQUET ;
4.
INSERT INTO TABLE test_parquet SELECT pdfs FROM test_hive;
--
This message was sent by Atlassian JIRA
(v6.2#6252)