[ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941648#comment-14941648
 ] 

Ashutosh Chauhan commented on HIVE-11977:
-----------------------------------------

If I understand correctly, reason you are suggesting is Reader should be 
resilient to such invalid files, If so, I think better place to skip such files 
is Avro's native reader itself. 
That way all of Avro user gets advantage of this, not just Hive. e.g, If you 
read that data directly (i.e., outside of Hive) than this fix will again be 
needed.

> Hive should handle an external avro table with zero length files present
> ------------------------------------------------------------------------
>
>                 Key: HIVE-11977
>                 URL: https://issues.apache.org/jira/browse/HIVE-11977
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Aaron Dossett
>            Assignee: Aaron Dossett
>         Attachments: HIVE-11977-2.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to