Lars Volker created AVRO-2045:
---------------------------------
Summary: Avro should warn about corrupt EOF files
Key: AVRO-2045
URL: https://issues.apache.org/jira/browse/AVRO-2045
Project: Avro
Issue Type: Bug
Affects Versions: 1.7.6
Reporter: Lars Volker
When running queries on truncated files, Impala's Avro scanner issues a warning:
{noformat}
WARNINGS: Problem parsing file
hdfs://host.company.com:8020/tmp/datagen/some_db/some_table/col1=A/col2=B/col3=D/col4=C/2017-05-18-18-5-9-876-0.avro
at 1327214080(EOF)
Tried to read 64653 bytes but could only read 16549 bytes. This may indicate
data file corruption. (file
hdfs://host.company.com:8020/tmp/datagen/some_db/some_table/col1=A/col2=B/col3=D/col4=C/2017-05-18-18-5-9-876-0.avro,
byte offset: 1327214080)
{noformat}
{{avro-tools tojson}} eventually prints the same number of rows that Impala
reads, but does not print a warning. Instead it seems to quietly swallow the
EOFException.
I think it should print a warning instead.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)