[ 
https://issues.apache.org/jira/browse/TEZ-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301499#comment-15301499
 ] 

Feng Yuan commented on TEZ-3273:
--------------------------------

there is rm logs:
2016-05-26 10:33:31,765 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing 
application with id application_1463493135662_90531
2016-05-26 10:33:31,765 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bae      
IP=192.168.44.40        OPERATION=Submit Application Request    
TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1463493135662_90531
2016-05-26 10:33:31,765 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from NEW to NEW_SAVING
2016-05-26 10:33:31,765 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1463493135662_90531
2016-05-26 10:33:31,772 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from NEW_SAVING to SUBMITTED
2016-05-26 10:33:31,772 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Accepted application application_1463493135662_90531 from user: bae, in queue: 
default, currently num of applications: 37
2016-05-26 10:33:31,772 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from SUBMITTED to ACCEPTED
2016-05-26 10:33:39,102 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Making reservation: node=xx-44p123-hadoop109.xx.com 
app_id=application_1463493135662_90531
2016-05-26 10:33:39,130 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Making reservation: node=xx-44p112-hadoop98.xx.com 
app_id=application_1463493135662_90531
2016-05-26 10:33:39,134 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Making reservation: node=xx-44p118-hadoop104.xx.com 
app_id=application_1463493135662_90531
2016-05-26 10:33:39,155 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bae      
OPERATION=AM Allocated Container        TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1463493135662_90531   
CONTAINERID=container_1463493135662_90531_01_000004
2016-05-26 10:33:39,155 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Storing attempt: AppId: application_1463493135662_90531 AttemptId: 
appattempt_1463493135662_90531_000001 MasterContainer: Container: [ContainerId: 
container_1463493135662_90531_01_000004, NodeId: 
xx-44p104-hadoop90.xx.com:59978, NodeHttpAddress: 
xx-44p104-hadoop90.xx.com:8042, Resource: <memory:4096, vCores:1>, Priority: 0, 
Token: Token { kind: ContainerToken, service: 192.168.44.104:59978 }, ]
2016-05-26 10:33:39,276 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Application application_1463493135662_90531 unreserved  on node host: 
xx-44p118-hadoop104.xx.com:4695 #containers=17 available=2560 used=28160, 
currently has 2 at priority 0; currentReservation <memory:8192, vCores:2>
2016-05-26 10:33:40,106 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Application application_1463493135662_90531 unreserved  on node host: 
xx-44p123-hadoop109.xx.com:4335 #containers=15 available=2560 used=28160, 
currently has 1 at priority 0; currentReservation <memory:4096, vCores:1>
2016-05-26 10:33:40,136 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
Application application_1463493135662_90531 unreserved  on node host: 
xx-44p112-hadoop98.xx.com:42781 #containers=16 available=2560 used=28160, 
currently has 0 at priority 0; currentReservation <memory:0, vCores:0>
2016-05-26 10:33:43,238 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bae      
IP=192.168.44.104       OPERATION=Register App Master   
TARGET=ApplicationMasterService RESULT=SUCCESS  
APPID=application_1463493135662_90531   
APPATTEMPTID=appattempt_1463493135662_90531_000001
2016-05-26 10:33:43,238 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from ACCEPTED to RUNNING
2016-05-26 10:33:48,748 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
application application_1463493135662_90531 with final state: FINISHING
2016-05-26 10:33:48,748 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from RUNNING to FINAL_SAVING
2016-05-26 10:33:48,750 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher 
event type: NodeDataChanged with state:SyncConnected for 
path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1463493135662_90531/appattempt_1463493135662_90531_000001
 for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2016-05-26 10:33:48,750 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
info for app: application_1463493135662_90531
2016-05-26 10:33:48,754 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher 
event type: NodeDataChanged with state:SyncConnected for 
path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1463493135662_90531 for 
Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in 
state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: 
STARTED
2016-05-26 10:33:48,754 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from FINAL_SAVING to FINISHING
2016-05-26 10:33:48,857 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
application_1463493135662_90531 unregistered successfully. 
2016-05-26 10:33:49,050 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bae      
OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1463493135662_90531   
CONTAINERID=container_1463493135662_90531_01_000004
2016-05-26 10:33:49,050 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1463493135662_90531 State change from FINISHING to FINISHED
2016-05-26 10:33:49,050 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bae      
OPERATION=Application Finished - Succeeded      TARGET=RMAppManager     
RESULT=SUCCESS  APPID=application_1463493135662_90531
2016-05-26 10:33:49,050 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: 
appId=application_1463493135662_90531,name=BFD_JOB_TASK_1415_20160526103310,user=bae,queue=root.default,state=FINISHED,trackingUrl=http://xx-44p12-rm01:8088/proxy/application_1463493135662_90531/A,appMasterHost=xx-44p104-hadoop90.xx.com,startTime=1464230011765,finishTime=1464230028748,finalStatus=SUCCEEDED
2016-05-26 10:33:49,051 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
Application application_1463493135662_90531 requests cleared

there are many resever container and unresever .

nm log for AM container:

2016-05-26 10:33:39,161 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1463493135662_90531_01_000004 by user bae
2016-05-26 10:33:39,161 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=bae  
IP=192.168.44.12        OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1463493135662_90531   
CONTAINERID=container_1463493135662_90531_01_000004
2016-05-26 10:33:39,162 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1463493135662_90531_01_000004 to application 
application_1463493135662_90531
2016-05-26 10:33:39,169 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1463493135662_90531_01_000004 transitioned from NEW to 
LOCALIZING
2016-05-26 10:33:39,173 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1463493135662_90531_01_000004
2016-05-26 10:33:39,175 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Writing credentials to the nmPrivate file 
/opt/data5/yarn_dir/local/nmPrivate/container_1463493135662_90531_01_000004.tokens.
 Credentials list: 
2016-05-26 10:33:39,342 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
from 
/opt/data5/yarn_dir/local/nmPrivate/container_1463493135662_90531_01_000004.tokens
 to 
/opt/data8/yarn_dir/local/usercache/bae/appcache/application_1463493135662_90531/container_1463493135662_90531_01_000004.tokens
2016-05-26 10:33:39,941 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1463493135662_90531_01_000004 transitioned from LOCALIZING 
to LOCALIZED
2016-05-26 10:33:39,964 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1463493135662_90531_01_000004 transitioned from LOCALIZED 
to RUNNING
2016-05-26 10:33:39,969 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
launchContainer: [bash, 
/opt/data5/yarn_dir/local/usercache/bae/appcache/application_1463493135662_90531/container_1463493135662_90531_01_000004/default_container_executor.sh]
2016-05-26 10:33:42,883 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Starting resource-monitoring for container_1463493135662_90531_01_000004
2016-05-26 10:33:43,223 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 38928 for container-id 
container_1463493135662_90531_01_000004: 223.1 MB of 4 GB physical memory used; 
4.1 GB of 8.4 GB virtual memory used
2016-05-26 10:33:46,765 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 38928 for container-id 
container_1463493135662_90531_01_000004: 227.7 MB of 4 GB physical memory used; 
4.1 GB of 8.4 GB virtual memory used
2016-05-26 10:33:48,908 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Container container_1463493135662_90531_01_000004 succeeded 
2016-05-26 10:33:48,908 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1463493135662_90531_01_000004 transitioned from RUNNING to 
EXITED_WITH_SUCCESS
2016-05-26 10:33:48,908 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1463493135662_90531_01_000004
.....
2016-05-26 10:33:48,929 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=bae  
OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    
RESULT=SUCCESS  APPID=application_1463493135662_90531   
CONTAINERID=container_1463493135662_90531_01_000004
2016-05-26 10:33:48,929 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1463493135662_90531_01_000004 transitioned from 
EXITED_WITH_SUCCESS to DONE
2016-05-26 10:33:48,929 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Removing container_1463493135662_90531_01_000004 from application 
application_1463493135662_90531
2016-05-26 10:33:48,929 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Considering container container_1463493135662_90531_01_000004 for 
log-aggregation
2016-05-26 10:33:49,056 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Stopping container with container Id: container_1463493135662_90531_01_000004
2016-05-26 10:33:49,056 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=bae  
IP=192.168.44.12        OPERATION=Stop Container Request        
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1463493135662_90531   
CONTAINERID=container_1463493135662_90531_01_000004
2016-05-26 10:33:49,059 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
completed containers from NM context: [container_1463493135662_90531_01_000004]
2016-05-26 10:33:49,073 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Uploading logs for container container_1463493135662_90531_01_000004. Current 
good log dirs are 
/opt/data10/yarn_dir/local,/opt/data2/yarn_dir/local,/opt/data4/yarn_dir/local,/opt/data6/yarn_dir/local,/opt/data8/yarn_dir/local,/opt/data5/yarn_dir/local,/opt/data1/yarn_dir/local,/opt/data9/yarn_dir/local,/opt/data7/yarn_dir/local,/opt/data3/yarn_dir/local
.....
2016-05-26 10:33:49,939 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Stopping resource-monitoring for container_1463493135662_90531_01_000004

you can see there are AM launched and exit imediately.

> In one vexter has some task failed,DAG will stuck forever.
> ----------------------------------------------------------
>
>                 Key: TEZ-3273
>                 URL: https://issues.apache.org/jira/browse/TEZ-3273
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.2
>         Environment: hive0.14 hadoop2.6
>            Reporter: Feng Yuan
>
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> stuck forever~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to