[
https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated HADOOP-3442:
----------------------------------
Attachment: spillbuffers.patch
bq. One thing I've found is that it depends on the amount of data per reducer.
I.e. tripling the number of reducers causes the stack overflow to not occur.
That makes sense; the spill is sorted by partition, then by key. Increasing the
number of reducers will avoid bad patterns in the data.
I'm attaching a patch that will spill some internal data structures on a
StackOverflowError and a verification tool. It's a server-side patch, so it
must be deployed with the TaskTrackers. It should write to the output directory
for the job. Once the spills are pulled to local disk, running the tool
*should* reproduce the error. If someone who can reproduce this would compress
and post the buffer/index data here, it would be hugely helpful.
What's written to disk are the entries in the MapOutputBuffer (keys and values)
for a single spill and a table recording partitions and key/value lengths.
> QuickSort may get into unbounded recursion
> ------------------------------------------
>
> Key: HADOOP-3442
> URL: https://issues.apache.org/jira/browse/HADOOP-3442
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Chris Douglas
> Attachments: 3442-0.patch, CheckSortBuffer.java, HADOOP-3442.patch,
> spillbuffers.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.