[ 
https://issues.apache.org/jira/browse/CRUNCH-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated CRUNCH-698:
--------------------------------
    Component/s: IO

> Avro DataFileReader creation can hang
> -------------------------------------
>
>                 Key: CRUNCH-698
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-698
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, IO
>            Reporter: Andrew Olson
>            Assignee: Josh Wills
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944] 
> was recently found in the static method for creating a DataFileReader 
> instance, where it can get stuck in an infinite loop while trying to read the 
> 4 byte "magic" header of the file.
> This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro 
> versions. The issue has existed since Avro 1.5 although we have encountered 
> it recently. It does not happen in normal circumstances, there has to be some 
> very unusual input stream behavior (partial/throttled read, or unexpected 
> EOF) causing it. We've only seen it with the S3AFileSystem's S3AInputStream, 
> suddenly starting a few days ago for no apparent reason. Even now it is 
> sporadic, happening a small percent of the time in job tasks that read many 
> S3 files but often enough to be problematic. An AWS support case is open to 
> attempt to find out what could have caused this.
> To avoid the external dependency on a particular Avro version to fix this, we 
> can probably just patch this locally in Crunch since it's only one static 
> method and apart from one legacy constant everything we need access to in the 
> Avro code is public.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to