[
https://issues.apache.org/jira/browse/TEZ-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294361#comment-14294361
]
Hitesh Shah commented on TEZ-1893:
----------------------------------
Seems like this is a critical enough issue that should be fixed in previous
releases too ( 0.5.x ). Comments? \cc [~bikassaha] [~sseth] [~zjffdu]
> Some vertex init fail are still not propagated to clients
> ---------------------------------------------------------
>
> Key: TEZ-1893
> URL: https://issues.apache.org/jira/browse/TEZ-1893
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
>
> {code}
> throw new TezUncheckedException(vertex.getLogIdentifier() +
> " has -1 tasks but does not have input initializers, " +
> "1-1 uninited sources or custom vertex manager to set it at
> runtime");
> {code}
> IMO, for this kind of verification we could do it in client side (DAG.verify)
> The following are the message on the client side, the reason that Client
> could not get the real status of DAG is that Tez AM is killed due to this
> vertex init error
> {code}
> 19:25:33,716 - Thread( main) - (RMProxy.java:98) - Connecting to
> ResourceManager at /0.0.0.0:8032
> 19:25:33,717 - Thread( main) - (AHSProxy.java:42) - Connecting to Application
> History server at /0.0.0.0:10200
> 19:25:34,724 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:35,725 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:36,726 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 2 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:36,846 - Thread( main) - (DAGClientImpl.java:463) - DAG initialized:
> CurrentState=Running
> 19:25:38,351 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:39,352 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:40,354 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 2 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:41,356 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 3 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:42,357 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 4 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:43,358 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 5 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:44,359 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 6 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:45,360 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 7 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:46,361 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 8 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:47,362 - Thread( main) - (Client.java:858) - Retrying connect to
> server: localhost/127.0.0.1:6000. Already tried 9 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 19:25:47,369 - Thread( main) - (DAGClientImpl.java:463) - DAG completed.
> FinalState=FAILED
> 19:25:47,369 - Thread( main) - (TezWordCount.java:203) - status=FAILED,
> progress=null, diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0,
> failedDAGs=0, killedDAGs=0
> , counters=null
> 19:25:47,372 - Thread( main) - (TezClient.java:470) - Shutting down Tez
> Session, sessionName=commonName, applicationId=application_1420335690331_0007
> 19:25:47,374 - Thread( main) - (TezClientUtils.java:838) - Application not
> running, applicationId=application_1420335690331_0007,
> yarnApplicationState=FINISHED, finalApplicationStatus=FAILED,
> trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
> diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0,
> killedDAGs=0
> 19:25:47,375 - Thread( main) - (TezClient.java:484) - Failed to shutdown Tez
> Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running,
> applicationId=application_1420335690331_0007, yarnApplicationState=FINISHED,
> finalApplicationStatus=FAILED,
> trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A,
> diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0,
> killedDAGs=0
> at
> org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839)
> at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669)
> at org.apache.tez.client.TezClient.stop(TezClient.java:476)
> at com.zjffdu.tez.tutorial.TezWordCount.main(TezWordCount.java:204)
> 19:25:47,377 - Thread( main) - (TezClient.java:489) - Could not connect to
> AM, killing session via YARN, sessionName=commonName,
> applicationId=application_1420335690331_0007
> 19:25:47,381 - Thread( main) - (YarnClientImpl.java:364) - Killed application
> application_1420335690331_0007
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)