I have a user running 0.6.1 tez (git hash 6e588d15184dc691df1c0227f40db91d9bc6d7d6) and a few times a month the job submitted returns this error.
2015-09-16 01:04:05,069 INFO [IPC Server handler 0 on 50500] app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0 2015-09-16 01:04:05,617 INFO [IPC Server handler 0 on 50500] history.HistoryEventHandler: [HISTORY][DAG:dag_1440165794704_806241_1][Event:DAG_SUBMITTED]: dagID=dag_1440165794704_806241_1, submitTime=1442365444946 2015-09-16 01:04:05,676 WARN [IPC Server handler 0 on 50500] ipc.Server: IPC Server handler 0 on 50500, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG from hostname:55741 Call#585 Retry#0 2015-09-16 01:04:05,679 FATAL [IPC Server handler 0 on 50500] yarn.YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0 on 50500,5,main] threw an Error. Shutting down now... 2015-09-16 01:04:05,764 INFO [IPC Server handler 0 on 50500] util.ExitUtil: Exiting with status -1 Normally it looks like this 2015-09-17 10:03:40,234 INFO [IPC Server handler 0 on 50503] app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0 2015-09-17 10:03:40,659 INFO [IPC Server handler 0 on 50503] history.HistoryEventHandler: [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_SUBMITTED]: dagID=dag_1440165794704_873289_1, submitTime=1442484220062 2015-09-17 10:03:40,694 INFO [IPC Server handler 0 on 50503] impl.VertexImpl: setting additional outputs for vertex scope-2573 2015-09-17 10:03:40,696 INFO [IPC Server handler 0 on 50503] impl.DAGImpl: Using DAG Scheduler: org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503] history.HistoryEventHandler: [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_INITIALIZED]: dagID=dag_1440165794704_873289_1, initTime=1442484220660 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503] impl.DAGImpl: dag_1440165794704_873289_1 transitioned from NEW to INITED ... and then it succeeds. Is there any known escaped Exceptions over rpc that have been fixed that would solved this issue? Jon
