Hi Felix,
Two options I can think of
1) Set longer timeouts -Dmapred.task.timeout=_____ in millisecond.
or
2) Have a separate thread that reports back to TaskTracker with status through
writing to stderr
https://issues.apache.org/jira/browse/HADOOP-1328
Format: "reporter:status:____"
Hope it works.
Koji
On 1/28/11 3:51 PM, "felix gao" <[email protected]> wrote:
mighty user group,
I am trying to write a streaming job that does a lot of io in a python program.
I know if I don't report back every x minutes the job will be terminated. How
do I report back to the task tracker in my streaming python job that is in the
middle of the gzip for example.
Thanks,
Felix