[ https://issues.apache.org/jira/browse/CRUNCH-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Olson resolved CRUNCH-698. --------------------------------- Fix Version/s: 1.1.0 Resolution: Fixed Pull request has been merged. > Avro DataFileReader creation can hang > ------------------------------------- > > Key: CRUNCH-698 > URL: https://issues.apache.org/jira/browse/CRUNCH-698 > Project: Crunch > Issue Type: Bug > Components: Core, IO > Reporter: Andrew Olson > Assignee: Josh Wills > Priority: Major > Fix For: 1.1.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944] > was recently found in the static method for creating a DataFileReader > instance, where it can get stuck in an infinite loop while trying to read the > 4 byte "magic" header of the file. > This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro > versions. The issue has existed since Avro 1.5 although we have encountered > it recently. It does not happen in normal circumstances, there has to be some > very unusual input stream behavior (partial/throttled read, or unexpected > EOF) causing it. We've only seen it with the S3AFileSystem's S3AInputStream, > suddenly starting a few days ago for no apparent reason. Even now it is > sporadic, happening a small percent of the time in job tasks that read many > S3 files but often enough to be problematic. An AWS support case is open to > attempt to find out what could have caused this. > To avoid the external dependency on a particular Avro version to fix this, we > can probably just patch this locally in Crunch since it's only one static > method and apart from one legacy constant everything we need access to in the > Avro code is public. -- This message was sent by Atlassian Jira (v8.3.4#803005)