[jira] [Created] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)

Ben Roling (JIRA) Tue, 25 Feb 2014 08:32:41 -0800

Ben Roling created MAPREDUCE-5767:
-------------------------------------

             Summary: Data corruption when single value exceeds map buffer size 
(io.sort.mb)
                 Key: MAPREDUCE-5767
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5767
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1
    Affects Versions: 0.20.1
            Reporter: Ben Roling



There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause 
data corruption when the size of a single value produced by the mapper exceeds 
the size of the map output buffer (roughly io.sort.mb).

I experienced this issue in CDH4.2.1, but am logging the issue here for greater 
visibility in case anyone else might run across the issue.

The issue does not exist in 0.21 and beyond due to the implementation of 
MAPREDUCE-64.  That JIRA significantly changes the way the map output buffering 
is done and it looks like the issue has been resolved by those changes.

I expect this bug will likely be closed / won't fix due to the fact that 0.20 
is obsolete.  As stated previously, I am just logging this issue for visibility 
in case anyone else is still running something based on 0.20 and encounters the 
same problem.

In my situation the issue manifested as an ArrayIndexOutOfBoundsException in 
the reduce phase when deserializing a key -- causing the job to fail.  However, 
I think the problem could manifest in a more dangerous fashion where the 
affected job succeeds, but produces corrupt output.  The stack trace I saw was:

2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.ArrayIndexOutOfBoundsException: 24
        at 
org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
        at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at 
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
        at 
org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86)
        at 
org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70)
        at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135)
        at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114)
        at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

The problem appears to me to be in 
org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, 
int).  The sequence of events that leads up to the issue is:

* some complete records (cumulative size less than total buffer size) written 
to buffer
* large (over io.sort.mb) record starts writing
* [soft buffer limit 
exceeded|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1030]
 - spill starts
* write of large record continues
* buffer becomes 
[full|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012]
* 
[wrap|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1013]
 evaluates to true, suggesting the buffer can be safely wrapped
* writing the large record continues until a write occurs such that bufindex + 
len == bufstart exactly.  When this happens 
[buffull|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1018]
 evaluates to false, so the data gets written to the buffer without event
* writing of the large value continues with another call to write(), starting 
the corruption of the buffer.  Buffer full can no longer be detected by the 
[buffull 
logic|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012]
 that is used when bufindex >= bufstart

The key to this problem occurring is a write where bufindex + len equals 
bufstart exactly.

I have titled the issue as having to do with writing large records (over 
io.sort.mb), but really I think the issue *could* occur on smaller records if 
the serializer generated a write of exactly the right size.  For example, if 
the buffer is getting close to full, but hasn't exceeded the buffer soft limit 
and then a collect() on a new value is called that triggers a write() such that 
bufindex + len == bufstart.  The size of the write would have to be relatively 
large -- greater than the free space offered by the soft limit (20% of the 
buffer by default), making the issue occurring that way pretty unlikely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)

Reply via email to