[ 
https://issues.apache.org/jira/browse/CRUNCH-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated CRUNCH-698:
--------------------------------
    Description: 
A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944] 
was recently found in the static method for creating a DataFileReader instance, 
where it can get stuck in an infinite loop while trying to read the 4 byte 
"magic" header of the file.

The stack trace looks like this,

{noformat}
"main" #1 prio=5 os_prio=0 tid=0x00007f8798027000 nid=0x7d9c runnable 
[0x00007f87a0924000]
   java.lang.Thread.State: RUNNABLE
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.avro.mapred.FsInput.read(FsInput.java:54)
        at 
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:55)
        at 
org.apache.crunch.types.avro.AvroRecordReader.initialize(AvroRecordReader.java:58)
        at 
org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:152)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:571)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:802)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
{noformat}

This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro 
versions. The issue has existed since Avro 1.5 although we have encountered it 
recently. It does not happen in normal circumstances, there has to be some very 
unusual input stream behavior (partial/throttled read, or unexpected EOF) 
causing it. We've only seen it with the S3AFileSystem's S3AInputStream, 
suddenly starting a few days ago for no apparent reason. Even now it is 
sporadic, happening a small percent of the time in job tasks that read many S3 
files but often enough to be problematic. An AWS support case is open to 
attempt to find out what could have caused this.

To avoid the external dependency on a particular Avro version to fix this, we 
can probably just patch this locally in Crunch since it's only one static 
method and apart from one legacy constant everything we need access to in the 
Avro code is public.

  was:
A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944] 
was recently found in the static method for creating a DataFileReader instance, 
where it can get stuck in an infinite loop while trying to read the 4 byte 
"magic" header of the file.

This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro 
versions. The issue has existed since Avro 1.5 although we have encountered it 
recently. It does not happen in normal circumstances, there has to be some very 
unusual input stream behavior (partial/throttled read, or unexpected EOF) 
causing it. We've only seen it with the S3AFileSystem's S3AInputStream, 
suddenly starting a few days ago for no apparent reason. Even now it is 
sporadic, happening a small percent of the time in job tasks that read many S3 
files but often enough to be problematic. An AWS support case is open to 
attempt to find out what could have caused this.

To avoid the external dependency on a particular Avro version to fix this, we 
can probably just patch this locally in Crunch since it's only one static 
method and apart from one legacy constant everything we need access to in the 
Avro code is public.


> Avro DataFileReader creation can hang
> -------------------------------------
>
>                 Key: CRUNCH-698
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-698
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, IO
>            Reporter: Andrew Olson
>            Assignee: Josh Wills
>            Priority: Major
>             Fix For: 1.1.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944] 
> was recently found in the static method for creating a DataFileReader 
> instance, where it can get stuck in an infinite loop while trying to read the 
> 4 byte "magic" header of the file.
> The stack trace looks like this,
> {noformat}
> "main" #1 prio=5 os_prio=0 tid=0x00007f8798027000 nid=0x7d9c runnable 
> [0x00007f87a0924000]
>    java.lang.Thread.State: RUNNABLE
>       at java.io.DataInputStream.read(DataInputStream.java:149)
>       at org.apache.avro.mapred.FsInput.read(FsInput.java:54)
>       at 
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:55)
>       at 
> org.apache.crunch.types.avro.AvroRecordReader.initialize(AvroRecordReader.java:58)
>       at 
> org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:152)
>       at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:571)
>       at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>       at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:802)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
> {noformat}
> This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro 
> versions. The issue has existed since Avro 1.5 although we have encountered 
> it recently. It does not happen in normal circumstances, there has to be some 
> very unusual input stream behavior (partial/throttled read, or unexpected 
> EOF) causing it. We've only seen it with the S3AFileSystem's S3AInputStream, 
> suddenly starting a few days ago for no apparent reason. Even now it is 
> sporadic, happening a small percent of the time in job tasks that read many 
> S3 files but often enough to be problematic. An AWS support case is open to 
> attempt to find out what could have caused this.
> To avoid the external dependency on a particular Avro version to fix this, we 
> can probably just patch this locally in Crunch since it's only one static 
> method and apart from one legacy constant everything we need access to in the 
> Avro code is public.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to