[jira] [Commented] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)

Tom De Leu (JIRA) Fri, 12 Dec 2014 04:42:23 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244062#comment-14244062
 ]


Tom De Leu commented on MAPREDUCE-5767:
---------------------------------------

Thanks a lot for this analysis! We had the exact same problem at work , running 
CDH3.5, with Crunch 0.8.4 and Avro 1.7.7.

It's only after trying a couple of days to find the cause of our problem, and 
not finding it, that I came across this issue via a lucky Google search.
I can confirm that increasing *io.sort.mb* solved our problem.

Thank you for saving us probably weeks of investigation :) 

> Data corruption when single value exceeds map buffer size (io.sort.mb)
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5767
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5767
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.1
>            Reporter: Ben Roling
>
> There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause 
> data corruption when the size of a single value produced by the mapper 
> exceeds the size of the map output buffer (roughly io.sort.mb).
> I experienced this issue in CDH4.2.1, but am logging the issue here for 
> greater visibility in case anyone else might run across the issue.
> The issue does not exist in 0.21 and beyond due to the implementation of 
> MAPREDUCE-64.  That JIRA significantly changes the way the map output 
> buffering is done and it looks like the issue has been resolved by those 
> changes.
> I expect this bug will likely be closed / won't fix due to the fact that 0.20 
> is obsolete.  As stated previously, I am just logging this issue for 
> visibility in case anyone else is still running something based on 0.20 and 
> encounters the same problem.
> In my situation the issue manifested as an ArrayIndexOutOfBoundsException in 
> the reduce phase when deserializing a key -- causing the job to fail.  
> However, I think the problem could manifest in a more dangerous fashion where 
> the affected job succeeds, but produces corrupt output.  The stack trace I 
> saw was:
> 2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.ArrayIndexOutOfBoundsException: 24
>       at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>       at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>       at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>       at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
>       at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86)
>       at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70)
>       at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135)
>       at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114)
>       at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291)
>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> The problem appears to me to be in 
> org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, 
> int).  The sequence of events that leads up to the issue is:
> * some complete records (cumulative size less than total buffer size) written 
> to buffer
> * large (over io.sort.mb) record starts writing
> * [soft buffer limit 
> exceeded|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1030]
>  - spill starts
> * write of large record continues
> * buffer becomes 
> [full|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012]
> * 
> [wrap|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1013]
>  evaluates to true, suggesting the buffer can be safely wrapped
> * writing the large record continues until a write occurs such that bufindex 
> + len == bufstart exactly.  When this happens 
> [buffull|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1018]
>  evaluates to false, so the data gets written to the buffer without event
> * writing of the large value continues with another call to write(), starting 
> the corruption of the buffer.  Buffer full can no longer be detected by the 
> [buffull 
> logic|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012]
>  that is used when bufindex >= bufstart
> The key to this problem occurring is a write where bufindex + len equals 
> bufstart exactly.
> I have titled the issue as having to do with writing large records (over 
> io.sort.mb), but really I think the issue *could* occur on smaller records if 
> the serializer generated a write of exactly the right size.  For example, if 
> the buffer is getting close to full, but hasn't exceeded the buffer soft 
> limit and then a collect() on a new value is called that triggers a write() 
> such that bufindex + len == bufstart.  The size of the write would have to be 
> relatively large -- greater than the free space offered by the soft limit 
> (20% of the buffer by default), making the issue occurring that way pretty 
> unlikely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)

Reply via email to