[
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941603#comment-14941603
]
Aaron Dossett commented on HIVE-11977:
--------------------------------------
[~ashutoshc] Thank you for your response! My thought is that any process for
generating this data could have failure scenarios that result in zero length
files, this was the case when I initially ran into this issue. A file was
opened on HDFS and "held" as zero length file before data was written to it,
and it crashed before any data could be written. The consequences of these
cases, that the entire table is unreadable (based on my experience), seems
disproportionate to the actual problem. Likewise, a process deleting empty
files could expose small windows where the table was unusable.
Would adding a warning and/or adding an option like
{{hive.exec.orc.skip.corrupt.data}} be more appropriate than silently ignoring
the files? This is my first foray into Hive internals, so perhaps that orc
option is not an exact comparison to this situation, but as a user it seems
similar.
Thank you again for the response and your feedback!
> Hive should handle an external avro table with zero length files present
> ------------------------------------------------------------------------
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
> Issue Type: Bug
> Reporter: Aaron Dossett
> Assignee: Aaron Dossett
> Attachments: HIVE-11977-2.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro
> table, all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws
> an exception when trying to read an empty file (because the empty file lacks
> the magic number marking it as avro).
> AvroGenericRecordReader should detect an empty file and then behave
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
> at
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)