Re: Streaming and subprocess error code

Andrey Pankov Thu, 15 May 2008 01:42:58 -0700

Hi Rick,

Double checked my test. The syslog output contains msg about non-zeroexit code (in this case mapper finished with segfault)

2008-05-14 18:12:04,473 INFO org.apache.hadoop.streaming.PipeMapRed:PipeMapRed.waitOutputThreads(): subprocess exited with code 134 inorg.apache.hadoop.streaming.PipeMapRed


stderr contains message with dump or smth about segfault.

Reducer job also finished with error:

2008-05-14 20:28:34,128 INFO org.apache.hadoop.streaming.PipeMapRed:PipeMapRed.waitOutputThreads(): subprocess exited with code 55 inorg.apache.hadoop.streaming.PipeMapRed


Hence entire job is successful

08/05/14 18:12:03 INFO streaming.StreamJob:  map 0%  reduce 0%
08/05/14 18:12:05 INFO streaming.StreamJob:  map 100%  reduce 0%
08/05/14 18:12:06 INFO streaming.StreamJob:  map 100%  reduce 100%

08/05/14 18:12:06 INFO streaming.StreamJob: Job complete:job_200805131958_002008/05/14 18:12:06 INFO streaming.StreamJob: Output:/user/hadoop/data1_result





Rick Cox wrote:

Hi,

Thanks: that message indicates the stream.non.zero.exit.is.failure
feature isn't enabled for this task; the log is just reporting the
exit status, but not raising the RuntimeException that it would if the
feature were turned on.

I've had problems getting this parameter through from the command line
before. If you've got access, you could try setting it in the
hadoop-site.xml instead (I think it should be the tasktrackers that
read that parameter).

(Sorry about the confusion here, we've been using that patch for so
long I had forgotten it wasn't yet released, and I'm not exactly sure
where we stand with these other bugs.)

rick

On Wed, May 14, 2008 at 11:05 PM, Andrey Pankov <[EMAIL PROTECTED]> wrote:

Hi Rick,

 Double checked my test. The syslog output contains msg about non-zero exit
code (in this case mapper finished with segfault)

 2008-05-14 18:12:04,473 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exited with code 134 in
org.apache.hadoop.streaming.PipeMapRed

 stderr contains message with dump or smth about segfault.

 Reducer job also finished with error:

 2008-05-14 20:28:34,128 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exited with code 55 in
org.apache.hadoop.streaming.PipeMapRed

 Hence entire job is successful

 08/05/14 18:12:03 INFO streaming.StreamJob:  map 0%  reduce 0%
 08/05/14 18:12:05 INFO streaming.StreamJob:  map 100%  reduce 0%
 08/05/14 18:12:06 INFO streaming.StreamJob:  map 100%  reduce 100%
 08/05/14 18:12:06 INFO streaming.StreamJob: Job complete:
job_200805131958_0020
 08/05/14 18:12:06 INFO streaming.StreamJob: Output:
/user/hadoop/data1_result






 Rick Cox wrote:

Does the syslog output from a should-have-failed task contain
something like this?

   java.lang.RuntimeException: PipeMapRed.waitOutputThreads():
subprocess failed with code 1

(In particular, I'm curious if it mentions the RuntimeException.)

Tasks that consume all their input and then exit non-zero are
definitely supposed to be counted as failed, so there's either a
problem with the setup or a bug somewhere.

rick

On Wed, May 14, 2008 at 8:49 PM, Andrey Pankov <[EMAIL PROTECTED]>

wrote:

Hi,

 I've tested this new option "-jobconf
stream.non.zero.exit.status.is.failure=true". Seems working but still

not

good for me. When mapper/reducer program have read all input data
successfully and fails after that, streaming still finishes successfully

so

there are no chances to know about some data post-processing errors in
subprocesses :(



 Andrey Pankov wrote:

Hi Rick,

Thank you for the quick response! I see this feature is in trunk and

not

available in last stable release. Anyway will try if it works for me

from

the trunk, and will try does it catch segmentation faults too.

Rick Cox wrote:

Try "-jobconf stream.non.zero.exit.status.is.failure=true".

That will tell streaming that a non-zero exit is a task failure. To
turn that into an immediate whole job failure, I think configuring 0
task retries (mapred.map.max.attempts=1 and
mapred.reduce.max.attempts=1) will be sufficient.

rick

On Tue, May 13, 2008 at 8:15 PM, Andrey Pankov <[EMAIL PROTECTED]>

wrote:

Hi all,

 I'm looking a way to force Streaming to shutdown the whole job in

case when

some of its subprocesses exits with non-zero error code.

 We have next situation. Sometimes either mapper or reducer could

crush, as

a rule it returns some exit code. In this case entire streaming

job

finishes

successfully, but that's wrong. Almost the same when any

subprocess

finishes

with segmentation fault.

 It's possible to check automatically if a subprocess crushed only

via

logs

but it means you need to parse tons of outputs/logs/dirs/etc.
 In order to find logs of your job you have to know it's jobid ~
job_200805130853_0016. I don't know easy way to determine it -

just

scan

stdout for the pattern. Then find logs of each mapper, each

reducer,

find a

way to parse them, etc, etc...

 So, is there any easiest way get correct status of the whole

streaming job

or I still have to build rather fragile parsing systems for such

purposes?

 Thanks in advance.

 --
 Andrey Pankov

 --
 Andrey Pankov


 --
 Andrey Pankov



--
Andrey Pankov

Re: Streaming and subprocess error code

Reply via email to