[
https://issues.apache.org/jira/browse/AVRO-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278049#comment-17278049
]
Andrew Olson commented on AVRO-2944:
------------------------------------
[~rskraba] Yes we did experience the same symptom in production. It was an
infinite loop. I'm actually not certain of all Avro versions in use there,
likely a variety of 1.7.x or 1.8.x. We were already able to patch this locally
(in a custom fork of the Crunch library, see CRUNCH-698) by replacing use of
DataFileReader#openReader with a private static method since the
DataFileReader's constructors are public. That definitively did fix it, things
are stable now. My concern with the previous correction is that it does not
handle the unexpected EOF/-1 case, as the JIRA description strangely suggested
it should, especially for reading Avro files from S3 using S3A which is what we
are doing. Our investigations unfortunately were not able to determine whether
the problem was simply due to a throttled read, or premature EOF occurring - we
needed to fix it as soon as possible and weren't able to gather additional
diagnostic information once it had been traced to this Avro issue after a
couple long days of troubleshooting. Hopefully that's enough context to decide
whether a follow-up release is necessary. It wouldn't be something that we
rapidly consumed and deployed, we're comfortable with the Crunch change as a
long-term solution here.
> DataFileReader has incorrect logic reading magic header
> -------------------------------------------------------
>
> Key: AVRO-2944
> URL: https://issues.apache.org/jira/browse/AVRO-2944
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.2
> Reporter: Mick Jermsurawong
> Assignee: Mick Jermsurawong
> Priority: Major
> Fix For: 1.10.1
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When creating reader using static method which includes checking for magic
> header, we currently read 4 bytes but the pointer is not correctly updated.
> [https://github.com/apache/avro/blob/328c539afc77da347ec52be1e112a6a7c371143b/lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader.java#L61-L62]
> When inputstream reads less byte than expected, this will get stuck in the
> loop until the end of file. Or if inputstream returns -1, for EOF like
> S3AInpustream, read hangs in this loop.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)