[
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924705#comment-13924705
]
Vinod Kumar Vavilapalli commented on MAPREDUCE-5028:
----------------------------------------------------
Looked at the patch. Tx for working on it, this one is a year in the making.
Let's do the last push!
So the main bug fix is the int overflow stuff in MapTask.java, right?
Not caused by your patch, but minor suggestions for InMemoryReader.java
- start and length can and should be marked final.
- memDataIn can be private and final
Can we fix the javadoc of the getLengh() API? Ideally I'd also rename the API,
but will leave it up to you. I don't have a problem reviewing a patch with that
rename. Anyways, back to javadoc for DataInputBuffer.getLength() and
DataInputBuffer.Buffer.getLength(), we can say what [~chris.douglas] said above
- (from ByteArrayInputStream) "returns the index one greater than the last
valid character in the input stream buffer."
There seems to be one more related bug w.r.t usage of DatInputBuffer.reset().
Can you also cross verify Task.ValuesIterator.readNextKey() and readNextValue()?
Why not run this LargeSorter with large values for io.sort.mb as a unit test?
I'm afraid that we may unknowingly add more such issues in future if we don't
automatically run it as part of our test-suite. The problem will be playing
with surefire's heap sizes, but that's about it.
We should hop onto MAPREDUCE-5032 next as I am not sure if we are fixing
everything here.
> Maps fail when io.sort.mb is set to high value
> ----------------------------------------------
>
> Key: MAPREDUCE-5028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Priority: Critical
> Fix For: 1.2.0
>
> Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch,
> mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch,
> mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch,
> repro-mr-5028.patch
>
>
> Verified the problem exists on branch-1 with the following configuration:
> Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m,
> io.sort.mb=1280, dfs.block.size=2147483648
> Run teragen to generate 4 GB data
> Maps fail when you run wordcount on this configuration with the following
> error:
> {noformat}
> java.io.IOException: Spill failed
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
> at
> org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> at
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> at
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)