[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181138#comment-14181138 ]
Frédéric TERRAZZONI commented on HIVE-8359: ------------------------------------------- Here is how you can create the sample Avro table: {code} CREATE EXTERNAL TABLE my_test_table(`avreau_col_1` map<string,string>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/fterrazzoni/my_test_table' TBLPROPERTIES("avro.schema.url"="hdfs://localhost:9000/user/fterrazzoni/path/to/avro.schema") ; {code} Where : - /user/fterrazzoni/my_test_table/ is a directory containing the file I provided (map_null_val.avro). - /user/fterrazzoni/path/to/avro.schema is a text file containing the Avro schema corresponding to the data, which is: {code} {"type":"record","name":"dku_record_0","namespace":"com.dataiku.dss","doc":"","fields":[{"name":"avreau_col_1","type":["null",{"type":"map","values":["null","string"]}],"default":null}]} {code} Now, you can simply try : {code} SELECT * from my_test_table; {code} ... and observe that the result is correct. However, if you copy that into a Parquet table: {code} CREATE TABLE test_parquet STORED AS PARQUET AS SELECT * FROM my_test_table; SELECT * FROM test_parquet; {code} ... the output is corrupted. > Map containing null values are not correctly written in Parquet files > --------------------------------------------------------------------- > > Key: HIVE-8359 > URL: https://issues.apache.org/jira/browse/HIVE-8359 > Project: Hive > Issue Type: Bug > Components: File Formats > Affects Versions: 0.13.1 > Reporter: Frédéric TERRAZZONI > Attachments: map_null_val.avro > > > Tried write a map<string,string> column in a Parquet file. The table should > contain : > {code} > {"key3":"val3","key4":null} > {"key3":"val3","key4":null} > {"key1":null,"key2":"val2"} > {"key3":"val3","key4":null} > {"key3":"val3","key4":null} > {code} > ... and when you do a query like {code}SELECT * from mytable{code} > We can see that the table is corrupted : > {code} > {"key3":"val3"} > {"key4":"val3"} > {"key3":"val2"} > {"key4":"val3"} > {"key1":"val3"} > {code} > I've not been able to read the Parquet file in our software afterwards, and > consequently I suspect it to be corrupted. > For those who are interested, I generated this Parquet table from an Avro > file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)