[ 
https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181138#comment-14181138
 ] 

Frédéric TERRAZZONI commented on HIVE-8359:
-------------------------------------------

Here is how you can create the sample Avro table:
{code}
CREATE EXTERNAL TABLE my_test_table(`avreau_col_1` map<string,string>) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS  
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'   
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
LOCATION '/user/fterrazzoni/my_test_table' 
TBLPROPERTIES("avro.schema.url"="hdfs://localhost:9000/user/fterrazzoni/path/to/avro.schema")
 ;
{code}

Where : 
- /user/fterrazzoni/my_test_table/ is a directory containing the file I 
provided (map_null_val.avro).
- /user/fterrazzoni/path/to/avro.schema is a text file containing the Avro 
schema corresponding to the data, which is:
  {code}
  
{"type":"record","name":"dku_record_0","namespace":"com.dataiku.dss","doc":"","fields":[{"name":"avreau_col_1","type":["null",{"type":"map","values":["null","string"]}],"default":null}]}
  {code}

Now, you can simply try :
{code} SELECT * from my_test_table; {code}
... and observe that the result is correct.

However, if you copy that into a Parquet table:
{code} 
CREATE TABLE test_parquet STORED AS PARQUET AS SELECT * FROM my_test_table;
SELECT * FROM test_parquet;
{code}
... the output is corrupted.

> Map containing null values are not correctly written in Parquet files
> ---------------------------------------------------------------------
>
>                 Key: HIVE-8359
>                 URL: https://issues.apache.org/jira/browse/HIVE-8359
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 0.13.1
>            Reporter: Frédéric TERRAZZONI
>         Attachments: map_null_val.avro
>
>
> Tried write a map<string,string> column in a Parquet file. The table should 
> contain :
> {code}
> {"key3":"val3","key4":null}
> {"key3":"val3","key4":null}
> {"key1":null,"key2":"val2"}
> {"key3":"val3","key4":null}
> {"key3":"val3","key4":null}
> {code}
> ... and when you do a query like {code}SELECT * from mytable{code}
> We can see that the table is corrupted :
> {code}
> {"key3":"val3"}
> {"key4":"val3"}
> {"key3":"val2"}
> {"key4":"val3"}
> {"key1":"val3"}
> {code}
> I've not been able to read the Parquet file in our software afterwards, and 
> consequently I suspect it to be corrupted. 
> For those who are interested, I generated this Parquet table from an Avro 
> file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to