The reason I filed this bug is that I believe one of the following guys
at org.apache.hadoop.streaming.PipeMapper.map
at org.apache.hadoop.mapred.MapRunner.run
should catch the exception and explain to the user what it thinks has
happened -- e.g. to show how many records has beed buffeed but not
consumed, what was the first / last record, etc.
Throwing an exception is rude.
On Jan 24, 2008, at 10:03 AM, Runping Qi (JIRA) wrote:
[ https://issues.apache.org/jira/browse/HADOOP-2438?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=12562157#action_12562157 ]
Runping Qi commented on HADOOP-2438:
------------------------------------
If the stream mapper stalled for some reason and cannot consume the
std input, while the
Java MapRed wrapper continues to pipe to the mapper, then maybe too
much data will be
accumulated in the std input pipe.
That may cause broken pipe or oom exception. What did the mapper do?
In streaming, jobs that used to work, crash in the map phase --
even if the mapper is /bin/cat
---------------------------------------------------------------------
-------------------------
Key: HADOOP-2438
URL: https://issues.apache.org/jira/browse/
HADOOP-2438
Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.15.1
Reporter: arkady borkovsky
The exception is either "out of memory" of or "broken pipe" -- see
both stack dumps bellow.
st Hadoop input: |null|
last tool output: |[EMAIL PROTECTED]|
Date: Sat Dec 15 21:02:18 UTC 2007
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at java.io.BufferedOutputStream.flushBuffer
(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush
(BufferedOutputStream.java:123)
at java.io.BufferedOutputStream.flush
(BufferedOutputStream.java:124)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.streaming.PipeMapper.map
(PipeMapper.java:96)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1760)
at org.apache.hadoop.streaming.PipeMapper.map
(PipeMapper.java:107)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1760)
-------------------------------------------------
java.io.IOException: MROutput/MRErrThread
failed:java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write
(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.io.Text.write(Text.java:243)
at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.collect (MapTask.java:347)
at org.apache.hadoop.streaming.PipeMapRed
$MROutputThread.run (PipeMapRed.java:344)
at org.apache.hadoop.streaming.PipeMapper.map
(PipeMapper.java:76)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:
1760)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.