Hi Zgeng,

Your help was significant - it was my mistake I messed up option names. Now it works as desired for me. Thanks a lot!

Zheng Shao wrote:
See
https://issues.apache.org/jira/secure/attachment/12369344/exit-status-20
57-0.16.patch

The option is called stream.non.zero.exit.is.failure, not
stream.non.zero.exit.status.is.failure.


Some users (including me) are pushing to make this option default to
true, but there is no response yet.
Dhruba, maybe you can help push that?

Zheng
-----Original Message-----
From: Joydeep Sen Sarma Sent: Wednesday, May 14, 2008 3:02 PM
To: Zheng Shao
Subject: FW: Streaming and subprocess error code

Looks like the bug is not fixed correctly in trunk ..

-----Original Message-----
From: Andrey Pankov [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 14, 2008 8:19 AM
To: [email protected]
Subject: Re: Streaming and subprocess error code

Hi,

I've tested this new option "-jobconf stream.non.zero.exit.status.is.failure=true". Seems working but still not good for me. When mapper/reducer program have read all input data successfully and fails after that, streaming still finishes successfully

so there are no chances to know about some data post-processing errors in subprocesses :(

Andrey Pankov wrote:
Hi Rick,

Thank you for the quick response! I see this feature is in trunk and
not
available in last stable release. Anyway will try if it works for me from the trunk, and will try does it catch segmentation faults too.


Rick Cox wrote:
Try "-jobconf stream.non.zero.exit.status.is.failure=true".

That will tell streaming that a non-zero exit is a task failure. To
turn that into an immediate whole job failure, I think configuring 0
task retries (mapred.map.max.attempts=1 and
mapred.reduce.max.attempts=1) will be sufficient.

rick

On Tue, May 13, 2008 at 8:15 PM, Andrey Pankov <[EMAIL PROTECTED]> wrote:
Hi all,

I'm looking a way to force Streaming to shutdown the whole job in case when
some of its subprocesses exits with non-zero error code.

We have next situation. Sometimes either mapper or reducer could crush, as a rule it returns some exit code. In this case entire streaming job finishes successfully, but that's wrong. Almost the same when any subprocess finishes
with segmentation fault.

It's possible to check automatically if a subprocess crushed only via logs
but it means you need to parse tons of outputs/logs/dirs/etc.
 In order to find logs of your job you have to know it's jobid ~
job_200805130853_0016. I don't know easy way to determine it - just
scan
stdout for the pattern. Then find logs of each mapper, each reducer,

find a
way to parse them, etc, etc...

So, is there any easiest way get correct status of the whole streaming job or I still have to build rather fragile parsing systems for such purposes?

 Thanks in advance.

 --
 Andrey Pankov







--
Andrey Pankov

Reply via email to