[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is array with parquet files

Sathish (JIRA) Tue, 26 Aug 2014 00:29:18 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110435#comment-14110435
 ]


Sathish commented on HIVE-7850:
-------------------------------

Hi Ryan,
I agree that the Hive should support lists with null elements. But can you give 
some idea on the cases where the no null lists are being generated, Whenever 
the parquet files are being generated from the Avro files most of the files are 
having the array schema as below
{code}
optional group name (LIST) {
  repeated string array_element;
}
{code}
Do you provide any suggestions on how best we can support for both kind of 
arrays. This patch only fix the arrays with no null entries.

> Hive Query failed if the data type is array<string> with parquet files
> ----------------------------------------------------------------------
>
>                 Key: HIVE-7850
>                 URL: https://issues.apache.org/jira/browse/HIVE-7850
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Sathish
>            Assignee: Sathish
>              Labels: parquet, serde
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7850.1.patch, HIVE-7850.patch
>
>
> * Created a parquet file from the Avro file which have 1 array data type and 
> rest are primitive types. Avro Schema of the array data type. Eg: 
> {code}
> { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, 
> "null" ] }
> {code}
> * Created External Hive table with the Array type as below, 
> {code}
> create external table paraArray (action Array) partitioned by (partitionid 
> int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
> inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
> 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
> alter table paraArray add partition(partitionid=1) location '/testPara';
> {code}
> * Run the following query(select action from paraArray limit 10) and the Map 
> reduce jobs are failing with the following exception.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ClassCastException: 
> parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
> org.apache.hadoop.io.ArrayWritable
> at 
> parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> ]
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> ... 8 more
> {code}
> This issue has long back posted on Parquet issues list and Since this is 
> related to Parquet Hive serde, I have created the Hive issue here, The 
> details and history of this information are as shown in the link here 
> https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is array with parquet files

Reply via email to