[
https://issues.apache.org/jira/browse/TEZ-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated TEZ-4488:
------------------------------
Description:
{code}
query-coordinator <11>1 2023-04-03T12:54:12.056Z query-coordinator-0-0
query-coordinator 1 10ea11e4-d4dc-4231-878e-0c8c07eda53b [mdc@18060
class="impl.DAGImpl" level="ERROR" thread="IPC Server handler 1 on 22222"]
Uncaught Exception when handling event DAG_INIT on Dag dag_1680526446742_0000_1
at currentState=NEW
java.lang.NullPointerException
at
org.apache.tez.dag.app.rm.TaskSchedulerManager.getTaskSchedulerClassName(TaskSchedulerManager.java:1082)
at
org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getTaskSchedulerClassName(DAGAppMaster.java:1702)
at org.apache.tez.dag.app.dag.impl.VertexImpl.<init>(VertexImpl.java:1061)
at org.apache.tez.dag.app.dag.impl.DAGImpl.createVertex(DAGImpl.java:1741)
at org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1596)
at
org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1869)
at
org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1846)
at
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1219)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:158)
at
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2231)
at
org.apache.tez.dag.app.DAGAppMaster.startDAGExecution(DAGAppMaster.java:2608)
at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2573)
at
org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1379)
at
org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:145)
at
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:187)
at
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:8519)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
{code}
currently TaskSchedulerManager depends on clientRpcServer, so
TaskSchedulerManager waits for clientRpcServer to start, and clientRpcServer
handles requests from e.g. HS2 (so HS2 is able to submit a dag) even before
taskSchedulerManager is initialized
we cannot change the order of service dependency as the TaskSchedulerManager
needs the app host/port, so we might want to simply block the very-first DAG to
be submitted while TaskSchedulerManager is not ready
> TaskSchedulerManager might not be initialized when the first DAG comes
> ----------------------------------------------------------------------
>
> Key: TEZ-4488
> URL: https://issues.apache.org/jira/browse/TEZ-4488
> Project: Apache Tez
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> {code}
> query-coordinator <11>1 2023-04-03T12:54:12.056Z query-coordinator-0-0
> query-coordinator 1 10ea11e4-d4dc-4231-878e-0c8c07eda53b [mdc@18060
> class="impl.DAGImpl" level="ERROR" thread="IPC Server handler 1 on 22222"]
> Uncaught Exception when handling event DAG_INIT on Dag
> dag_1680526446742_0000_1 at currentState=NEW
> java.lang.NullPointerException
> at
> org.apache.tez.dag.app.rm.TaskSchedulerManager.getTaskSchedulerClassName(TaskSchedulerManager.java:1082)
> at
> org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getTaskSchedulerClassName(DAGAppMaster.java:1702)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.<init>(VertexImpl.java:1061)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.createVertex(DAGImpl.java:1741)
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1596)
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1869)
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1846)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1219)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:158)
> at
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2231)
> at
> org.apache.tez.dag.app.DAGAppMaster.startDAGExecution(DAGAppMaster.java:2608)
> at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2573)
> at
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1379)
> at
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:145)
> at
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:187)
> at
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:8519)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> at java.base/java.security.AccessController.doPrivileged(Native Method)
> at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> {code}
> currently TaskSchedulerManager depends on clientRpcServer, so
> TaskSchedulerManager waits for clientRpcServer to start, and clientRpcServer
> handles requests from e.g. HS2 (so HS2 is able to submit a dag) even before
> taskSchedulerManager is initialized
> we cannot change the order of service dependency as the TaskSchedulerManager
> needs the app host/port, so we might want to simply block the very-first DAG
> to be submitted while TaskSchedulerManager is not ready
--
This message was sent by Atlassian Jira
(v8.20.10#820010)