Re: Reduce not completing

Amareshwari Sriramadasu Tue, 23 Dec 2008 22:53:02 -0800

I couldn't get much from the logs why it is so.

For reporting status, you can write to stderr from your script. Theformat should be reporter:status:<message>. If program emits such linesin stderr, the framework will treat them as status report. Hope thisclarifies.


Thanks
Amareshwari
Rick Hangartner wrote:

Hi Amareshwari,

This may or may not be helpful.
Here's an example of three runs in rapid succession. The first andlast completed without any problems. The middle one completed in thiscase, but the log has three exceptions of the type we reported (Wethink a fourth exception would have terminated the run based on whatwe've seen in the past.)
Here are excerpts from the progress reports for the three runs that wesee on the remote machine from which we are submitting the streaming job:
08/12/23 19:02:46 INFO streaming.StreamJob: Running job:job_200812221742_0074
08/12/23 19:02:46 INFO streaming.StreamJob: To kill this job, run:
08/12/23 19:02:46 INFO streaming.StreamJob:/Users/hadoop/hadoop/bin/../bin/hadoop job-Dmapred.job.tracker=hnode9.cor.mystrands.in:9001 -killjob_200812221742_007408/12/23 19:02:46 INFO streaming.StreamJob: Tracking URL:http://hnode9.cor.mystrands.in:50030/jobdetails.jsp?jobid=job_200812221742_0074
08/12/23 19:02:47 INFO streaming.StreamJob:  map 0%  reduce 0%
08/12/23 19:02:50 INFO streaming.StreamJob:  map 22%  reduce 0%
08/12/23 19:02:51 INFO streaming.StreamJob:  map 56%  reduce 0%
08/12/23 19:02:54 INFO streaming.StreamJob:  map 67%  reduce 0%
08/12/23 19:02:55 INFO streaming.StreamJob:  map 78%  reduce 0%
08/12/23 19:02:57 INFO streaming.StreamJob:  map 87%  reduce 0%
08/12/23 19:02:59 INFO streaming.StreamJob:  map 90%  reduce 0%
08/12/23 19:03:01 INFO streaming.StreamJob:  map 92%  reduce 0%
08/12/23 19:03:02 INFO streaming.StreamJob:  map 96%  reduce 22%
08/12/23 19:03:07 INFO streaming.StreamJob:  map 100%  reduce 26%
08/12/23 19:03:18 INFO streaming.StreamJob:  map 100%  reduce 73%
08/12/23 19:03:23 INFO streaming.StreamJob:  map 100%  reduce 91%
08/12/23 19:03:25 INFO streaming.StreamJob:  map 100%  reduce 100%
08/12/23 19:03:25 INFO streaming.StreamJob: Job complete:job_200812221742_0074
08/12/23 19:04:10 INFO streaming.StreamJob: Running job:job_200812221742_0075
08/12/23 19:04:10 INFO streaming.StreamJob: To kill this job, run:
08/12/23 19:04:10 INFO streaming.StreamJob:/Users/hadoop/hadoop/bin/../bin/hadoop job-Dmapred.job.tracker=hnode9.cor.mystrands.in:9001 -killjob_200812221742_007508/12/23 19:04:10 INFO streaming.StreamJob: Tracking URL:http://hnode9.cor.mystrands.in:50030/jobdetails.jsp?jobid=job_200812221742_0075
08/12/23 19:04:11 INFO streaming.StreamJob:  map 0%  reduce 0%
08/12/23 19:04:14 INFO streaming.StreamJob:  map 33%  reduce 0%
08/12/23 19:04:15 INFO streaming.StreamJob:  map 56%  reduce 0%
08/12/23 19:04:16 INFO streaming.StreamJob:  map 67%  reduce 0%
08/12/23 19:04:17 INFO streaming.StreamJob:  map 71%  reduce 0%
08/12/23 19:04:20 INFO streaming.StreamJob:  map 87%  reduce 0%
08/12/23 19:04:23 INFO streaming.StreamJob:  map 89%  reduce 0%
08/12/23 19:04:24 INFO streaming.StreamJob:  map 95%  reduce 0%
08/12/23 19:04:26 INFO streaming.StreamJob:  map 98%  reduce 0%
08/12/23 19:04:28 INFO streaming.StreamJob:  map 100%  reduce 0%
08/12/23 19:04:30 INFO streaming.StreamJob:  map 100%  reduce 30%
08/12/23 19:04:41 INFO streaming.StreamJob:  map 100%  reduce 0%
08/12/23 19:05:08 INFO streaming.StreamJob:  map 100%  reduce 84%
08/12/23 19:05:13 INFO streaming.StreamJob:  map 100%  reduce 100%
08/12/23 19:05:14 INFO streaming.StreamJob: Job complete:job_200812221742_0075
08/12/23 19:08:24 INFO streaming.StreamJob: Running job:job_200812221742_0076
08/12/23 19:08:24 INFO streaming.StreamJob: To kill this job, run:
08/12/23 19:08:24 INFO streaming.StreamJob:/Users/hadoop/hadoop/bin/../bin/hadoop job-Dmapred.job.tracker=hnode9.cor.mystrands.in:9001 -killjob_200812221742_007608/12/23 19:08:24 INFO streaming.StreamJob: Tracking URL:http://hnode9.cor.mystrands.in:50030/jobdetails.jsp?jobid=job_200812221742_0076
08/12/23 19:08:26 INFO streaming.StreamJob:  map 0%  reduce 0%
08/12/23 19:08:29 INFO streaming.StreamJob:  map 11%  reduce 0%
08/12/23 19:08:31 INFO streaming.StreamJob:  map 44%  reduce 0%
08/12/23 19:08:32 INFO streaming.StreamJob:  map 56%  reduce 0%
08/12/23 19:08:33 INFO streaming.StreamJob:  map 78%  reduce 0%
08/12/23 19:08:35 INFO streaming.StreamJob:  map 81%  reduce 0%
08/12/23 19:08:36 INFO streaming.StreamJob:  map 91%  reduce 0%
08/12/23 19:08:37 INFO streaming.StreamJob:  map 92%  reduce 0%
08/12/23 19:08:40 INFO streaming.StreamJob:  map 96%  reduce 19%
08/12/23 19:08:45 INFO streaming.StreamJob:  map 100%  reduce 19%
08/12/23 19:08:50 INFO streaming.StreamJob:  map 100%  reduce 30%
08/12/23 19:09:01 INFO streaming.StreamJob:  map 100%  reduce 85%
08/12/23 19:09:05 INFO streaming.StreamJob:  map 100%  reduce 100%
08/12/23 19:09:05 INFO streaming.StreamJob: Job complete:job_200812221742_0076
For the middle run, the job tracker long has the following exceptionsreported, but for the first and last the log does not list anyexceptions:
2008-12-23 19:04:27,816 INFO org.apache.hadoop.mapred.JobInProgress:Task 'attempt_200812221742_0075_m_000000_0' has completedtask_200812221742_0075_m_000000 successfully.2008-12-23 19:04:36,505 INFO org.apache.hadoop.mapred.TaskInProgress:Error from attempt_200812221742_0075_r_000000_0:java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocessfailed with code 1atorg.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:301)atorg.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:518)atorg.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:102)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
2008-12-23 19:04:39,050 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'attempt_200812221742_0075_r_000000_0' from'tracker_hnode2.cor.mystrands.in:localhost/127.0.0.1:36968'2008-12-23 19:04:39,214 INFO org.apache.hadoop.mapred.JobTracker:Adding task 'attempt_200812221742_0075_r_000000_1' to tiptask_200812221742_0075_r_000000, for tracker'tracker_hnode3.cor.mystrands.in:localhost/127.0.0.1:37777'2008-12-23 19:04:44,237 INFO org.apache.hadoop.mapred.TaskInProgress:Error from attempt_200812221742_0075_r_000000_1:java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocessfailed with code 1atorg.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:301)atorg.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:518)atorg.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:102)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
2008-12-23 19:04:47,845 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'attempt_200812221742_0075_r_000000_1' from'tracker_hnode3.cor.mystrands.in:localhost/127.0.0.1:37777'2008-12-23 19:04:47,856 INFO org.apache.hadoop.mapred.JobTracker:Adding task 'attempt_200812221742_0075_r_000000_2' to tiptask_200812221742_0075_r_000000, for tracker'tracker_hnode1.cor.mystrands.in:localhost/127.0.0.1:37971'2008-12-23 19:04:57,781 INFO org.apache.hadoop.mapred.TaskInProgress:Error from attempt_200812221742_0075_r_000000_2:java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocessfailed with code 1atorg.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:301)atorg.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:518)atorg.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:102)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
2008-12-23 19:04:57,781 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'attempt_200812221742_0075_r_000000_2' from'tracker_hnode1.cor.mystrands.in:localhost/127.0.0.1:37971'
Thanks,
RDH

On Dec 23, 2008, at 1:00 AM, Amareshwari Sriramadasu wrote:
You can report status from streaming job by emittingreporter:status:<message> in stderr.See documentation @http://hadoop.apache.org/core/docs/r0.18.2/streaming.html#How+do+I+update+status+in+streaming+applications%3F
But from the exception trace, it doesn't look like lack ofreport(timeout). The trace tells that reducer jvm process exited withexit-code 1.It's mostly a bug in reducer code. What is the configuration value ofthe property "stream.non.zero.exit.status.is.failure" ?
Thanks
Amareshwari
Rick Hangartner wrote:
Hi,
We seem to be seeing an runtime exception in the Reduce phase of astreaming Map-Reduce that has been mentioned before on this list.
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/%[email protected]%3e
When I Google the exception, the only thing returned is to this oneshort thread on the mailing list. Unfortunately, we don't quiteunderstand the exception message in our current situation or theeventual explanation and resolution of that previous case.
We have tested that the Python script run in the Reduce phase runswithout problems. It returns the correct results when run from thecommand line fed from stdin by a file that is the output of the mapphase for a small map-reduce job that fails in this way.
Here's the exception we are seeing from the jobtracker log:
2008-12-22 18:13:36,415 INFO org.apache.hadoop.mapred.JobInProgress:Task 'attempt_200812221742_0004_m_000009_0' has completedtask_200812221742_0004_m_000009 successfully.2008-12-22 18:13:50,607 INFOorg.apache.hadoop.mapred.TaskInProgress: Error fromattempt_200812221742_0004_r_000000_0: java.lang.RuntimeException:PipeMapRed.waitOutputThreads(): subprocess failed with code 1atorg.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:301)atorg.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:518)atorg.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:102)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
2008-12-22 18:13:52,045 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'attempt_200812221742_0004_r_000000_0' from'tracker_hnode3.cor.mystrands.in:localhost/127.0.0.1:37777'2008-12-22 18:13:52,175 INFO org.apache.hadoop.mapred.JobTracker:Adding task 'attempt_200812221742_0004_r_000000_1' to tiptask_200812221742_0004_r_000000, for tracker'tracker_hnode5.cor.mystrands.in:localhost/127.0.0.1:55254'
We typically see 4 repetitions of this exception in the log. And wemay see 1-2 sets of those repetitions.
If someone could explain what this exception actually means, andperhaps what we might need to change in our configuration to fix it,we would be most appreciative. Naively, it almost seems if a taskis just taking slightly too long to complete and report that fact,perhaps because of other Hadoop or MR processes going on at the sametime. If we re-run this map-reduce, it does sometimes run to asuccessful completion without an exception.
We are just testing map-reduce as a candidate for a number of datareduction tasks right now. We are running Hadoop 18.1 on a clusterof 9 retired desktop machines that just have 100Mb networking andabout 2GB of RAM each, so that's why we are suspecting this couldjust be a problem that tasks are taking slightly too long to reportback they have completed, rather than an actual bug. (We will beupgrading this test cluster to Hadoop 19.x and 1Gb networking veryshortly.)
Thanks,
RDH

Begin forwarded message:
From: "Rick Cox" <[email protected]>
Date: May 14, 2008 9:01:31 AM PDT
To: [email protected], [email protected]
Subject: Re: Streaming and subprocess error code
Reply-To: [email protected]

Does the syslog output from a should-have-failed task contain
something like this?

  java.lang.RuntimeException: PipeMapRed.waitOutputThreads():
subprocess failed with code 1

(In particular, I'm curious if it mentions the RuntimeException.)

Tasks that consume all their input and then exit non-zero are
definitely supposed to be counted as failed, so there's either a
problem with the setup or a bug somewhere.

rick
On Wed, May 14, 2008 at 8:49 PM, Andrey Pankov<[email protected]> wrote:
Hi,

I've tested this new option "-jobconf
stream.non.zero.exit.status.is.failure=true". Seems working butstill not
good for me. When mapper/reducer program have read all input data
successfully and fails after that, streaming still finishessuccessfully sothere are no chances to know about some data post-processingerrors in
subprocesses :(



Andrey Pankov wrote:

Re: Reduce not completing

Reply via email to