[ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941603#comment-14941603
 ] 

Aaron Dossett commented on HIVE-11977:
--------------------------------------

[~ashutoshc] Thank you for your response! My thought is that any process for 
generating this data could have failure scenarios that result in zero length 
files, this was the case when I initially ran into this issue.  A file was 
opened on HDFS and "held" as zero length file before data was written to it, 
and it crashed before any data could be written.  The consequences of these 
cases, that the entire table is unreadable (based on my experience), seems 
disproportionate to the actual problem.  Likewise, a process deleting empty 
files could expose small windows where the table was unusable.

Would adding a warning and/or adding an option like 
{{hive.exec.orc.skip.corrupt.data}} be more appropriate than silently ignoring 
the files?  This is my first foray into Hive internals, so perhaps that orc 
option is not an exact comparison to this situation, but as a user it seems 
similar.

Thank you again for the response and your feedback!

> Hive should handle an external avro table with zero length files present
> ------------------------------------------------------------------------
>
>                 Key: HIVE-11977
>                 URL: https://issues.apache.org/jira/browse/HIVE-11977
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Aaron Dossett
>            Assignee: Aaron Dossett
>         Attachments: HIVE-11977-2.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to