[
https://issues.apache.org/jira/browse/CRUNCH-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Olson updated CRUNCH-698:
--------------------------------
Component/s: IO
> Avro DataFileReader creation can hang
> -------------------------------------
>
> Key: CRUNCH-698
> URL: https://issues.apache.org/jira/browse/CRUNCH-698
> Project: Crunch
> Issue Type: Bug
> Components: Core, IO
> Reporter: Andrew Olson
> Assignee: Josh Wills
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944]
> was recently found in the static method for creating a DataFileReader
> instance, where it can get stuck in an infinite loop while trying to read the
> 4 byte "magic" header of the file.
> This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro
> versions. The issue has existed since Avro 1.5 although we have encountered
> it recently. It does not happen in normal circumstances, there has to be some
> very unusual input stream behavior (partial/throttled read, or unexpected
> EOF) causing it. We've only seen it with the S3AFileSystem's S3AInputStream,
> suddenly starting a few days ago for no apparent reason. Even now it is
> sporadic, happening a small percent of the time in job tasks that read many
> S3 files but often enough to be problematic. An AWS support case is open to
> attempt to find out what could have caused this.
> To avoid the external dependency on a particular Avro version to fix this, we
> can probably just patch this locally in Crunch since it's only one static
> method and apart from one legacy constant everything we need access to in the
> Avro code is public.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)