Lars Volker created AVRO-2045:
---------------------------------

             Summary: Avro should warn about corrupt EOF files
                 Key: AVRO-2045
                 URL: https://issues.apache.org/jira/browse/AVRO-2045
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.7.6
            Reporter: Lars Volker


When running queries on truncated files, Impala's Avro scanner issues a warning:

{noformat}
WARNINGS: Problem parsing file 
hdfs://host.company.com:8020/tmp/datagen/some_db/some_table/col1=A/col2=B/col3=D/col4=C/2017-05-18-18-5-9-876-0.avro
 at 1327214080(EOF)
Tried to read 64653 bytes but could only read 16549 bytes. This may indicate 
data file corruption. (file 
hdfs://host.company.com:8020/tmp/datagen/some_db/some_table/col1=A/col2=B/col3=D/col4=C/2017-05-18-18-5-9-876-0.avro,
 byte offset: 1327214080)
{noformat}

{{avro-tools tojson}} eventually prints the same number of rows that Impala 
reads, but does not print a warning. Instead it seems to quietly swallow the 
EOFException.

I think it should print a warning instead.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to