The only layer I can think of in that path might be the recovery layer which is a blocking call that writes the dag submission information to the recovery log. If that fails, the dag submission should also then fail. Though it could potentially be optimized to let the dag submission to go through, run the dag but with recovery disabled ( double-edged sword if the dag is a large one ).
However, that does not really explain the “YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0 on 50500,5,main] threw an Error. Shutting down now…” TEZ-2663 is something that could be applicable to branch 0.6 but not sure if it will help track this particular issue down. thanks — Hitesh On Sep 18, 2015, at 8:41 AM, Jonathan Eagles <[email protected]> wrote: > I have a user running 0.6.1 tez (git hash > 6e588d15184dc691df1c0227f40db91d9bc6d7d6) and a few times a month the job > submitted returns this error. > > 2015-09-16 01:04:05,069 INFO [IPC Server handler 0 on 50500] > app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0 > 2015-09-16 01:04:05,617 INFO [IPC Server handler 0 on 50500] > history.HistoryEventHandler: > [HISTORY][DAG:dag_1440165794704_806241_1][Event:DAG_SUBMITTED]: > dagID=dag_1440165794704_806241_1, submitTime=1442365444946 > 2015-09-16 01:04:05,676 WARN [IPC Server handler 0 on 50500] > ipc.Server: IPC Server handler 0 on 50500, call > org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG > from hostname:55741 Call#585 Retry#0 > 2015-09-16 01:04:05,679 FATAL [IPC Server handler 0 on 50500] > yarn.YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0 > on 50500,5,main] threw an Error. Shutting down now... > 2015-09-16 01:04:05,764 INFO [IPC Server handler 0 on 50500] > util.ExitUtil: Exiting with status -1 > > > Normally it looks like this > > 2015-09-17 10:03:40,234 INFO [IPC Server handler 0 on 50503] > app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0 > 2015-09-17 10:03:40,659 INFO [IPC Server handler 0 on 50503] > history.HistoryEventHandler: > [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_SUBMITTED]: > dagID=dag_1440165794704_873289_1, submitTime=1442484220062 > 2015-09-17 10:03:40,694 INFO [IPC Server handler 0 on 50503] > impl.VertexImpl: setting additional outputs for vertex scope-2573 > 2015-09-17 10:03:40,696 INFO [IPC Server handler 0 on 50503] > impl.DAGImpl: Using DAG Scheduler: > org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder > 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503] > history.HistoryEventHandler: > [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_INITIALIZED]: > dagID=dag_1440165794704_873289_1, initTime=1442484220660 > 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503] > impl.DAGImpl: dag_1440165794704_873289_1 transitioned from NEW to > INITED > ... > > and then it succeeds. > > Is there any known escaped Exceptions over rpc that have been fixed > that would solved this issue? > > Jon
