Just some basic questions to see if I am understanding correctly:
- When you ask for the SamzaAppMaster log, you mean the container that ends 
with _000001? There logs that I find in the userlogs folder do not say much. 
stdout shows the command launched, stderr shows nothing and gc.log.0 some lines 
like this:

2015-09-28T17:51:51.556+0200: 2,479: [GC 28273K->8028K(77312K), 0,0124110 secs]
2015-09-28T17:51:51.568+0200: 2,492: [Full GC 8028K->4959K(77312K), 0,1294490 
secs]
2015-09-28T18:24:32.285+0200: 1963,208: [GC 37727K->8047K(77312K), 0,0087100 
secs]
2015-09-28T18:51:51.699+0200: 3602,623: [GC 22599K->5977K(77312K), 0,0065030 
secs]
2015-09-28T18:51:51.706+0200: 3602,629: [Full GC 5977K->2445K(77312K), 
0,0914750 secs]
2015-09-28T19:51:51.799+0200: 7202,722: [GC 33571K->2541K(110080K), 0,0032800 
secs]
2015-09-28T19:51:51.802+0200: 7202,725: [Full GC 2541K->2457K(110080K), 
0,0921670 secs]

- I reinstalled the servers (following the process I described in the other 
mail) and the problem persists. I am not sure where  When you speak of leaking 
orphaned processes, you mean a process running in memory or some kind of disk 
level corruption? First one should be solved with a shutdown, but second one 
may imply locating some files and deleting them. 
- I am stuck with the problem, so I may say I am reproducing it. What should I 
capture to send to them?

Thanks,

        Jordi

-----Mensaje original-----
De: Yi Pan [mailto:nickpa...@gmail.com] 
Enviado el: lunes, 28 de septiembre de 2015 20:04
Para: dev@samza.apache.org
Asunto: Re: process killing

What did the SamzaAppMaster log say? It seems to me that the Yarn RM lost
the state about the AppMaster instances. We have seen this in a very rare
cases in LinkedIn (about once or twice a year) that Yarn leaks out some
orphaned processes but really have no chance to reliably reproduce it. If
you see this repeatedly, it would be nice to capture the reproduce sequence
and provide it to Yarn community for help. Please keep us cc'ed as well.

Thanks!

-Yi

On Mon, Sep 28, 2015 at 2:33 AM, Jordi Blasi Uribarri <jbl...@nextel.es>
wrote:

> This is an excerpt from the yarn-root-resourcemanager-kfk-samza01.out
> file. Tell me if you need another file.
>
> Thanks,
>
>    jordi
>
> 0001 State change from NEW to SUBMITTED
> 2015-09-28 11:15:58,081 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:activateApplications(626)) - Application
> application_1443431699703_0003 from user: root activated in queue: default
> 2015-09-28 11:15:58,081 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(643)) -
> Application added - appId: application_1443431699703_0003 user:
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@3d3c59a,
> leaf-queue: default #user-pending-applications: 0
> #user-active-applications: 3 #queue-pending-applications: 0
> #queue-active-applications: 3
> 2015-09-28 11:15:58,081 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:addApplicationAttempt(746)) - Added Application
> Attempt appattempt_1443431699703_0003_000001 to scheduler from user root in
> queue default
> 2015-09-28 11:15:58,082 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0003_000001 State change from SUBMITTED to
> SCHEDULED
> 2015-09-28 11:15:58,135 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0003_01_000001 Container Transitioned from NEW to
> ALLOCATED
> 2015-09-28 11:15:58,135 INFO  [ResourceManager Event Processor]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) -
> USER=root  OPERATION=AM Allocated Container     TARGET=SchedulerApp
>  RESULT=SUCCESS  APPID=application_1443431699703_0003
> CONTAINERID=container_1443431699703_0003_01_000001
> 2015-09-28 11:15:58,135 INFO  [ResourceManager Event Processor]
> scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(141)) -
> Assigned container container_1443431699703_0003_01_000001 of capacity
> <memory:256, vCores:1> on host kfk-samza01:36066, which has 3 containers,
> <memory:768, vCores:3> used and <memory:1280, vCores:5> available after
> allocation
> 2015-09-28 11:15:58,136 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:assignContainer(1570)) -
> assignedContainer application attempt=appattempt_1443431699703_0003_000001
> container=Container: [ContainerId: container_1443431699703_0003_01_000001,
> NodeId: kfk-samza01:36066, NodeHttpAddress: kfk-samza01:8042, Resource:
> <memory:256, vCores:1>, Priority: 0, Token: null, ] queue=default:
> capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:512, vCores:2>,
> usedCapacity=0.125, absoluteUsedCapacity=0.125, numApps=3, numContainers=2
> clusterResource=<memory:4096, vCores:16>
> 2015-09-28 11:15:58,136 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:assignContainersToChildQueues(601))
> - Re-sorting assigned queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:768, vCores:3>,
> usedCapacity=0.1875, absoluteUsedCapacity=0.1875, numApps=3, numContainers=3
> 2015-09-28 11:15:58,137 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:assignContainers(464)) -
> assignedContainer queue=root usedCapacity=0.1875
> absoluteUsedCapacity=0.1875 used=<memory:768, vCores:3>
> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:15:58,138 INFO  [AsyncDispatcher event handler]
> security.NMTokenSecretManagerInRM
> (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken
> for nodeId : kfk-samza01:36066 for container :
> container_1443431699703_0003_01_000001
> 2015-09-28 11:15:58,144 INFO  [AsyncDispatcher event handler]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0003_01_000001 Container Transitioned from
> ALLOCATED to ACQUIRED
> 2015-09-28 11:15:58,144 INFO  [AsyncDispatcher event handler]
> security.NMTokenSecretManagerInRM
> (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node
> set for appattempt_1443431699703_0003_000001
> 2015-09-28 11:15:58,144 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(1837)) -
> Storing attempt: AppId: application_1443431699703_0003 AttemptId:
> appattempt_1443431699703_0003_000001 MasterContainer: Container:
> [ContainerId: container_1443431699703_0003_01_000001, NodeId:
> kfk-samza01:36066, NodeHttpAddress: kfk-samza01:8042, Resource:
> <memory:256, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
> service: 192.168.15.92:36066 }, ]
> 2015-09-28 11:15:58,145 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0003_000001 State change from SCHEDULED to
> ALLOCATED_SAVING
> 2015-09-28 11:15:58,145 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0003_000001 State change from ALLOCATED_SAVING to
> ALLOCATED
> 2015-09-28 11:15:58,161 INFO  [pool-1-thread-3] amlauncher.AMLauncher
> (AMLauncher.java:run(253)) - Launching
> masterappattempt_1443431699703_0003_000001
> 2015-09-28 11:15:58,165 INFO  [pool-1-thread-3] amlauncher.AMLauncher
> (AMLauncher.java:launch(106)) - Setting up container Container:
> [ContainerId: container_1443431699703_0003_01_000001, NodeId:
> kfk-samza01:36066, NodeHttpAddress: kfk-samza01:8042, Resource:
> <memory:256, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
> service: 192.168.15.92:36066 }, ] for AM
> appattempt_1443431699703_0003_000001
> 2015-09-28 11:15:58,166 INFO  [pool-1-thread-3] amlauncher.AMLauncher
> (AMLauncher.java:createAMContainerLaunchContext(191)) - Command to launch
> container container_1443431699703_0003_01_000001 : export
> SAMZA_LOG_DIR=<LOG_DIR> && ln -sfn <LOG_DIR> logs && exec
> ./__package/bin/run-am.sh 1>logs/stdout 2>logs/stderr
> 2015-09-28 11:15:58,166 INFO  [pool-1-thread-3]
> security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken
> for ApplicationAttempt: appattempt_1443431699703_0003_000001
> 2015-09-28 11:15:58,166 INFO  [pool-1-thread-3]
> security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for
> appattempt_1443431699703_0003_000001
> 2015-09-28 11:15:58,194 INFO  [pool-1-thread-3] amlauncher.AMLauncher
> (AMLauncher.java:launch(127)) - Done launching container Container:
> [ContainerId: container_1443431699703_0003_01_000001, NodeId:
> kfk-samza01:36066, NodeHttpAddress: kfk-samza01:8042, Resource:
> <memory:256, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
> service: 192.168.15.92:36066 }, ] for AM
> appattempt_1443431699703_0003_000001
> 2015-09-28 11:15:58,194 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0003_000001 State change from ALLOCATED to LAUNCHED
> 2015-09-28 11:15:59,138 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0003_01_000001 Container Transitioned from ACQUIRED
> to RUNNING
> 2015-09-28 11:18:30,736 INFO  [Socket Reader #1 for port 8030] ipc.Server
> (Server.java:saslProcess(1306)) - Auth successful for
> appattempt_1443431699703_0001_000001 (auth:SIMPLE)
> 2015-09-28 11:18:30,766 INFO  [IPC Server handler 8 on 8030]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerApplicationMaster(274)) - AM
> registration appattempt_1443431699703_0001_000001
> 2015-09-28 11:18:30,768 INFO  [IPC Server handler 8 on 8030]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(127)) -
> USER=root     IP=192.168.15.92     OPERATION=Register App Master
>  TARGET=ApplicationMasterService RESULT=SUCCESS
> APPID=application_1443431699703_0001
> APPATTEMPTID=appattempt_1443431699703_0001_000001
> 2015-09-28 11:18:30,769 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0001_000001 State change from LAUNCHED to RUNNING
> 2015-09-28 11:18:30,769 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0001 State change from ACCEPTED to RUNNING
> 2015-09-28 11:18:31,952 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0001_01_000002 Container Transitioned from NEW to
> ALLOCATED
> 2015-09-28 11:18:31,952 INFO  [ResourceManager Event Processor]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) -
> USER=root  OPERATION=AM Allocated Container     TARGET=SchedulerApp
>  RESULT=SUCCESS  APPID=application_1443431699703_0001
> CONTAINERID=container_1443431699703_0001_01_000002
> 2015-09-28 11:18:31,953 INFO  [ResourceManager Event Processor]
> scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(141)) -
> Assigned container container_1443431699703_0001_01_000002 of capacity
> <memory:256, vCores:1> on host kfk-samza02:59687, which has 1 containers,
> <memory:256, vCores:1> used and <memory:1792, vCores:7> available after
> allocation
> 2015-09-28 11:18:31,953 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:assignContainer(1570)) -
> assignedContainer application attempt=appattempt_1443431699703_0001_000001
> container=Container: [ContainerId: container_1443431699703_0001_01_000002,
> NodeId: kfk-samza02:59687, NodeHttpAddress: kfk-samza02:8042, Resource:
> <memory:256, vCores:1>, Priority: 0, Token: null, ] queue=default:
> capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:768, vCores:3>,
> usedCapacity=0.1875, absoluteUsedCapacity=0.1875, numApps=3,
> numContainers=3 clusterResource=<memory:4096, vCores:16>
> 2015-09-28 11:18:31,953 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:assignContainersToChildQueues(601))
> - Re-sorting assigned queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:1024, vCores:4>,
> usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=3, numContainers=4
> 2015-09-28 11:18:31,954 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:assignContainers(464)) -
> assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25
> used=<memory:1024, vCores:4> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:18:32,921 INFO  [IPC Server handler 18 on 8030]
> security.NMTokenSecretManagerInRM
> (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken
> for nodeId : kfk-samza02:59687 for container :
> container_1443431699703_0001_01_000002
> 2015-09-28 11:18:32,923 INFO  [IPC Server handler 18 on 8030]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0001_01_000002 Container Transitioned from
> ALLOCATED to ACQUIRED
> 2015-09-28 11:18:33,980 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0001_01_000002 Container Transitioned from ACQUIRED
> to RUNNING
> 2015-09-28 11:22:17,889 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0003 State change from ACCEPTED to KILLING
> 2015-09-28 11:22:17,890 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1129)) -
> Updating application attempt appattempt_1443431699703_0003_000001 with
> final state: KILLED, and exit status: -1000
> 2015-09-28 11:22:17,891 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0003_000001 State change from LAUNCHED to
> FINAL_SAVING
> 2015-09-28 11:22:17,892 INFO  [AsyncDispatcher event handler]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:unregisterAttempt(676)) - Unregistering app
> attempt : appattempt_1443431699703_0003_000001
> 2015-09-28 11:22:17,893 INFO  [AsyncDispatcher event handler]
> security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application
> finished, removing password for appattempt_1443431699703_0003_000001
> 2015-09-28 11:22:17,893 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0003_000001 State change from FINAL_SAVING to
> KILLED
> 2015-09-28 11:22:17,894 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl
> (RMAppImpl.java:rememberTargetTransitionsAndStoreState(992)) - Updating
> application application_1443431699703_0003 with final state: KILLED
> 2015-09-28 11:22:17,895 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0003 State change from KILLING to FINAL_SAVING
> 2015-09-28 11:22:17,895 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:doneApplicationAttempt(785)) - Application Attempt
> appattempt_1443431699703_0003_000001 is done. finalState=KILLED
> 2015-09-28 11:22:17,895 INFO  [AsyncDispatcher event handler]
> recovery.RMStateStore (RMStateStore.java:transition(161)) - Updating info
> for app: application_1443431699703_0003
> 2015-09-28 11:22:17,901 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0003_01_000001 Container Transitioned from RUNNING
> to KILLED
> 2015-09-28 11:22:17,901 INFO  [ResourceManager Event Processor]
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(98)) -
> Completed container: container_1443431699703_0003_01_000001 in state:
> KILLED event:KILL
> 2015-09-28 11:22:17,901 INFO  [ResourceManager Event Processor]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) -
> USER=root  OPERATION=AM Released Container      TARGET=SchedulerApp
>  RESULT=SUCCESS  APPID=application_1443431699703_0003
> CONTAINERID=container_1443431699703_0003_01_000001
> 2015-09-28 11:22:17,902 INFO  [ResourceManager Event Processor]
> scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(204)) -
> Released container container_1443431699703_0003_01_000001 of capacity
> <memory:256, vCores:1> on host kfk-samza01:36066, which currently has 2
> containers, <memory:512, vCores:2> used and <memory:1536, vCores:6>
> available, release resources=true
> 2015-09-28 11:22:17,902 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:releaseResource(1723)) - default
> used=<memory:768, vCores:3> numContainers=3 user=root
> user-resources=<memory:768, vCores:3>
> 2015-09-28 11:22:17,903 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:completedContainer(1674)) -
> completedContainer container=Container: [ContainerId:
> container_1443431699703_0003_01_000001, NodeId: kfk-samza01:36066,
> NodeHttpAddress: kfk-samza01:8042, Resource: <memory:256, vCores:1>,
> Priority: 0, Token: Token { kind: ContainerToken, service:
> 192.168.15.92:36066 }, ] queue=default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:768, vCores:3>,
> usedCapacity=0.1875, absoluteUsedCapacity=0.1875, numApps=3,
> numContainers=3 cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:17,903 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(646)) -
> completedContainer queue=root usedCapacity=0.1875
> absoluteUsedCapacity=0.1875 used=<memory:768, vCores:3>
> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:17,904 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(664)) -
> Re-sorting completed queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:768, vCores:3>,
> usedCapacity=0.1875, absoluteUsedCapacity=0.1875, numApps=3, numContainers=3
> 2015-09-28 11:22:17,904 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(1215)) - Application attempt
> appattempt_1443431699703_0003_000001 released container
> container_1443431699703_0003_01_000001 on node: host: kfk-samza01:36066
> #containers=2 available=1536 used=512 with event: KILL
> 2015-09-28 11:22:17,904 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0003 State change from FINAL_SAVING to KILLED
> 2015-09-28 11:22:17,905 INFO  [ResourceManager Event Processor]
> scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(115)) -
> Application application_1443431699703_0003 requests cleared
> 2015-09-28 11:22:17,906 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(686)) -
> Application removed - appId: application_1443431699703_0003 user: root
> queue: default #user-pending-applications: 0 #user-active-applications: 2
> #queue-pending-applications: 0 #queue-active-applications: 2
> 2015-09-28 11:22:17,906 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:removeApplication(411)) -
> Application removed - appId: application_1443431699703_0003 user: root
> leaf-queue of parent: root #applications: 2
> 2015-09-28 11:22:17,907 INFO  [AsyncDispatcher event handler]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(148)) -
> USER=root    OPERATION=Application Finished - Killed
> TARGET=RMAppManager     RESULT=SUCCESS  APPID=application_1443431699703_0003
> 2015-09-28 11:22:17,909 INFO  [pool-1-thread-4] amlauncher.AMLauncher
> (AMLauncher.java:run(267)) - Cleaning master
> appattempt_1443431699703_0003_000001
> 2015-09-28 11:22:17,910 INFO  [AsyncDispatcher event handler]
> resourcemanager.RMAppManager$ApplicationSummary
> (RMAppManager.java:logAppSummary(179)) -
> appId=application_1443431699703_0003,name=flow.OperationJob_1,user=root,queue=default,state=KILLED,trackingUrl=
> http://kfk-samza01:8088/cluster/app/application_1443431699703_0003,appMasterHost=N/A,startTime=1443431758077,finishTime=1443432137894,finalStatus=KILLED
> 2015-09-28 11:22:18,102 INFO  [IPC Server handler 3 on 8032]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(148)) -
> USER=root     IP=192.168.15.92     OPERATION=Kill Application Request
> TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1443431699703_0003
> 2015-09-28 11:22:20,493 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0001 State change from RUNNING to KILLING
> 2015-09-28 11:22:20,494 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1129)) -
> Updating application attempt appattempt_1443431699703_0001_000001 with
> final state: KILLED, and exit status: -1000
> 2015-09-28 11:22:20,494 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0001_000001 State change from RUNNING to
> FINAL_SAVING
> 2015-09-28 11:22:20,494 INFO  [AsyncDispatcher event handler]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:unregisterAttempt(676)) - Unregistering app
> attempt : appattempt_1443431699703_0001_000001
> 2015-09-28 11:22:20,495 INFO  [AsyncDispatcher event handler]
> security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application
> finished, removing password for appattempt_1443431699703_0001_000001
> 2015-09-28 11:22:20,495 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0001_000001 State change from FINAL_SAVING to
> KILLED
> 2015-09-28 11:22:20,495 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl
> (RMAppImpl.java:rememberTargetTransitionsAndStoreState(992)) - Updating
> application application_1443431699703_0001 with final state: KILLED
> 2015-09-28 11:22:20,495 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0001 State change from KILLING to FINAL_SAVING
> 2015-09-28 11:22:20,496 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:doneApplicationAttempt(785)) - Application Attempt
> appattempt_1443431699703_0001_000001 is done. finalState=KILLED
> 2015-09-28 11:22:20,496 INFO  [AsyncDispatcher event handler]
> recovery.RMStateStore (RMStateStore.java:transition(161)) - Updating info
> for app: application_1443431699703_0001
> 2015-09-28 11:22:20,496 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0001_01_000002 Container Transitioned from RUNNING
> to KILLED
> 2015-09-28 11:22:20,496 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0001 State change from FINAL_SAVING to KILLED
> 2015-09-28 11:22:20,496 INFO  [pool-1-thread-5] amlauncher.AMLauncher
> (AMLauncher.java:run(267)) - Cleaning master
> appattempt_1443431699703_0001_000001
> 2015-09-28 11:22:20,497 INFO  [AsyncDispatcher event handler]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(148)) -
> USER=root    OPERATION=Application Finished - Killed
> TARGET=RMAppManager     RESULT=SUCCESS  APPID=application_1443431699703_0001
> 2015-09-28 11:22:20,496 INFO  [ResourceManager Event Processor]
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(98)) -
> Completed container: container_1443431699703_0001_01_000002 in state:
> KILLED event:KILL
> 2015-09-28 11:22:20,497 INFO  [ResourceManager Event Processor]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) -
> USER=root  OPERATION=AM Released Container      TARGET=SchedulerApp
>  RESULT=SUCCESS  APPID=application_1443431699703_0001
> CONTAINERID=container_1443431699703_0001_01_000002
> 2015-09-28 11:22:20,497 INFO  [AsyncDispatcher event handler]
> resourcemanager.RMAppManager$ApplicationSummary
> (RMAppManager.java:logAppSummary(179)) -
> appId=application_1443431699703_0001,name=flow.Router_1,user=root,queue=default,state=KILLED,trackingUrl=
> http://kfk-samza01:8088/cluster/app/application_1443431699703_0001,appMasterHost=N/A,startTime=1443431751189,finishTime=1443432140495,finalStatus=KILLED
> 2015-09-28 11:22:20,497 INFO  [ResourceManager Event Processor]
> scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(204)) -
> Released container container_1443431699703_0001_01_000002 of capacity
> <memory:256, vCores:1> on host kfk-samza02:59687, which currently has 0
> containers, <memory:0, vCores:0> used and <memory:2048, vCores:8>
> available, release resources=true
> 2015-09-28 11:22:20,498 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:releaseResource(1723)) - default
> used=<memory:512, vCores:2> numContainers=2 user=root
> user-resources=<memory:512, vCores:2>
> 2015-09-28 11:22:20,498 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:completedContainer(1674)) -
> completedContainer container=Container: [ContainerId:
> container_1443431699703_0001_01_000002, NodeId: kfk-samza02:59687,
> NodeHttpAddress: kfk-samza02:8042, Resource: <memory:256, vCores:1>,
> Priority: 0, Token: Token { kind: ContainerToken, service:
> 192.168.15.94:59687 }, ] queue=default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:512, vCores:2>,
> usedCapacity=0.125, absoluteUsedCapacity=0.125, numApps=2, numContainers=2
> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:20,499 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(646)) -
> completedContainer queue=root usedCapacity=0.125 absoluteUsedCapacity=0.125
> used=<memory:512, vCores:2> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:20,499 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(664)) -
> Re-sorting completed queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:512, vCores:2>,
> usedCapacity=0.125, absoluteUsedCapacity=0.125, numApps=2, numContainers=2
> 2015-09-28 11:22:20,499 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(1215)) - Application attempt
> appattempt_1443431699703_0001_000001 released container
> container_1443431699703_0001_01_000002 on node: host: kfk-samza02:59687
> #containers=0 available=2048 used=0 with event: KILL
> 2015-09-28 11:22:20,500 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0001_01_000001 Container Transitioned from RUNNING
> to KILLED
> 2015-09-28 11:22:20,500 INFO  [ResourceManager Event Processor]
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(98)) -
> Completed container: container_1443431699703_0001_01_000001 in state:
> KILLED event:KILL
> 2015-09-28 11:22:20,500 INFO  [ResourceManager Event Processor]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) -
> USER=root  OPERATION=AM Released Container      TARGET=SchedulerApp
>  RESULT=SUCCESS  APPID=application_1443431699703_0001
> CONTAINERID=container_1443431699703_0001_01_000001
> 2015-09-28 11:22:20,500 INFO  [ResourceManager Event Processor]
> scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(204)) -
> Released container container_1443431699703_0001_01_000001 of capacity
> <memory:256, vCores:1> on host kfk-samza01:36066, which currently has 1
> containers, <memory:256, vCores:1> used and <memory:1792, vCores:7>
> available, release resources=true
> 2015-09-28 11:22:20,500 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:releaseResource(1723)) - default
> used=<memory:256, vCores:1> numContainers=1 user=root
> user-resources=<memory:256, vCores:1>
> 2015-09-28 11:22:20,501 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:completedContainer(1674)) -
> completedContainer container=Container: [ContainerId:
> container_1443431699703_0001_01_000001, NodeId: kfk-samza01:36066,
> NodeHttpAddress: kfk-samza01:8042, Resource: <memory:256, vCores:1>,
> Priority: 0, Token: Token { kind: ContainerToken, service:
> 192.168.15.92:36066 }, ] queue=default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:256, vCores:1>,
> usedCapacity=0.0625, absoluteUsedCapacity=0.0625, numApps=2,
> numContainers=1 cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:20,501 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(646)) -
> completedContainer queue=root usedCapacity=0.0625
> absoluteUsedCapacity=0.0625 used=<memory:256, vCores:1>
> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:20,501 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(664)) -
> Re-sorting completed queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:256, vCores:1>,
> usedCapacity=0.0625, absoluteUsedCapacity=0.0625, numApps=2, numContainers=1
> 2015-09-28 11:22:20,501 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(1215)) - Application attempt
> appattempt_1443431699703_0001_000001 released container
> container_1443431699703_0001_01_000001 on node: host: kfk-samza01:36066
> #containers=1 available=1792 used=256 with event: KILL
> 2015-09-28 11:22:20,502 INFO  [ResourceManager Event Processor]
> scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(115)) -
> Application application_1443431699703_0001 requests cleared
> 2015-09-28 11:22:20,502 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(686)) -
> Application removed - appId: application_1443431699703_0001 user: root
> queue: default #user-pending-applications: 0 #user-active-applications: 1
> #queue-pending-applications: 0 #queue-active-applications: 1
> 2015-09-28 11:22:20,502 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:removeApplication(411)) -
> Application removed - appId: application_1443431699703_0001 user: root
> leaf-queue of parent: root #applications: 1
> 2015-09-28 11:22:20,700 INFO  [IPC Server handler 4 on 8032]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(148)) -
> USER=root     IP=192.168.15.92     OPERATION=Kill Application Request
> TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1443431699703_0001
> 2015-09-28 11:22:20,971 ERROR [IPC Server handler 22 on 8030]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:allocate(435)) - Application attempt
> appattempt_1443431699703_0001_000001 doesn't exist in
> ApplicationMasterService cache.
> 2015-09-28 11:22:20,974 INFO  [IPC Server handler 22 on 8030] ipc.Server
> (Server.java:run(2060)) - IPC Server handler 22 on 8030, call
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from
> 192.168.15.92:53988 Call#231 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
> Application attempt appattempt_1443431699703_0001_000001 doesn't exist in
> ApplicationMasterService cache.
>         at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
>         at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> 2015-09-28 11:22:23,161 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0002 State change from ACCEPTED to KILLING
> 2015-09-28 11:22:23,162 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1129)) -
> Updating application attempt appattempt_1443431699703_0002_000001 with
> final state: KILLED, and exit status: -1000
> 2015-09-28 11:22:23,163 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0002_000001 State change from LAUNCHED to
> FINAL_SAVING
> 2015-09-28 11:22:23,163 INFO  [AsyncDispatcher event handler]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:unregisterAttempt(676)) - Unregistering app
> attempt : appattempt_1443431699703_0002_000001
> 2015-09-28 11:22:23,164 INFO  [AsyncDispatcher event handler]
> security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application
> finished, removing password for appattempt_1443431699703_0002_000001
> 2015-09-28 11:22:23,164 INFO  [AsyncDispatcher event handler]
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(764)) -
> appattempt_1443431699703_0002_000001 State change from FINAL_SAVING to
> KILLED
> 2015-09-28 11:22:23,165 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl
> (RMAppImpl.java:rememberTargetTransitionsAndStoreState(992)) - Updating
> application application_1443431699703_0002 with final state: KILLED
> 2015-09-28 11:22:23,165 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0002 State change from KILLING to FINAL_SAVING
> 2015-09-28 11:22:23,165 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:doneApplicationAttempt(785)) - Application Attempt
> appattempt_1443431699703_0002_000001 is done. finalState=KILLED
> 2015-09-28 11:22:23,166 INFO  [AsyncDispatcher event handler]
> recovery.RMStateStore (RMStateStore.java:transition(161)) - Updating info
> for app: application_1443431699703_0002
> 2015-09-28 11:22:23,166 INFO  [ResourceManager Event Processor]
> rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(380)) -
> container_1443431699703_0002_01_000001 Container Transitioned from RUNNING
> to KILLED
> 2015-09-28 11:22:23,166 INFO  [AsyncDispatcher event handler]
> rmapp.RMAppImpl (RMAppImpl.java:handle(718)) -
> application_1443431699703_0002 State change from FINAL_SAVING to KILLED
> 2015-09-28 11:22:23,166 INFO  [ResourceManager Event Processor]
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(98)) -
> Completed container: container_1443431699703_0002_01_000001 in state:
> KILLED event:KILL
> 2015-09-28 11:22:23,167 INFO  [AsyncDispatcher event handler]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(148)) -
> USER=root    OPERATION=Application Finished - Killed
> TARGET=RMAppManager     RESULT=SUCCESS  APPID=application_1443431699703_0002
> 2015-09-28 11:22:23,167 INFO  [ResourceManager Event Processor]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) -
> USER=root  OPERATION=AM Released Container      TARGET=SchedulerApp
>  RESULT=SUCCESS  APPID=application_1443431699703_0002
> CONTAINERID=container_1443431699703_0002_01_000001
> 2015-09-28 11:22:23,167 INFO  [ResourceManager Event Processor]
> scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(204)) -
> Released container container_1443431699703_0002_01_000001 of capacity
> <memory:256, vCores:1> on host kfk-samza01:36066, which currently has 0
> containers, <memory:0, vCores:0> used and <memory:2048, vCores:8>
> available, release resources=true
> 2015-09-28 11:22:23,168 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:releaseResource(1723)) - default
> used=<memory:0, vCores:0> numContainers=0 user=root
> user-resources=<memory:0, vCores:0>
> 2015-09-28 11:22:23,168 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:completedContainer(1674)) -
> completedContainer container=Container: [ContainerId:
> container_1443431699703_0002_01_000001, NodeId: kfk-samza01:36066,
> NodeHttpAddress: kfk-samza01:8042, Resource: <memory:256, vCores:1>,
> Priority: 0, Token: Token { kind: ContainerToken, service:
> 192.168.15.92:36066 }, ] queue=default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:4096,
> vCores:16>
> 2015-09-28 11:22:23,169 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(646)) -
> completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0
> used=<memory:0, vCores:0> cluster=<memory:4096, vCores:16>
> 2015-09-28 11:22:23,169 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:completedContainer(664)) -
> Re-sorting completed queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0
> 2015-09-28 11:22:23,169 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(1215)) - Application attempt
> appattempt_1443431699703_0002_000001 released container
> container_1443431699703_0002_01_000001 on node: host: kfk-samza01:36066
> #containers=0 available=2048 used=0 with event: KILL
> 2015-09-28 11:22:23,169 INFO  [ResourceManager Event Processor]
> scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(115)) -
> Application application_1443431699703_0002 requests cleared
> 2015-09-28 11:22:23,170 INFO  [pool-1-thread-6] amlauncher.AMLauncher
> (AMLauncher.java:run(267)) - Cleaning master
> appattempt_1443431699703_0002_000001
> 2015-09-28 11:22:23,168 INFO  [AsyncDispatcher event handler]
> resourcemanager.RMAppManager$ApplicationSummary
> (RMAppManager.java:logAppSummary(179)) -
> appId=application_1443431699703_0002,name=flow.WorkFlow_1,user=root,queue=default,state=KILLED,trackingUrl=
> http://kfk-samza01:8088/cluster/app/application_1443431699703_0002,appMasterHost=N/A,startTime=1443431754782,finishTime=1443432143165,finalStatus=KILLED
> 2015-09-28 11:22:23,170 INFO  [ResourceManager Event Processor]
> capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(686)) -
> Application removed - appId: application_1443431699703_0002 user: root
> queue: default #user-pending-applications: 0 #user-active-applications: 0
> #queue-pending-applications: 0 #queue-active-applications: 0
> 2015-09-28 11:22:23,172 INFO  [ResourceManager Event Processor]
> capacity.ParentQueue (ParentQueue.java:removeApplication(411)) -
> Application removed - appId: application_1443431699703_0002 user: root
> leaf-queue of parent: root #applications: 0
> 2015-09-28 11:22:23,184 INFO  [IPC Server handler 20 on 8030]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:finishApplicationMaster(351)) -
> application_1443431699703_0001 unregistered successfully.
> 2015-09-28 11:22:23,371 INFO  [IPC Server handler 0 on 8032]
> resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(148)) -
> USER=root     IP=192.168.15.92     OPERATION=Kill Application Request
> TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1443431699703_0002
> 2015-09-28 11:22:24,195 INFO  [ResourceManager Event Processor]
> capacity.CapacityScheduler
> (CapacityScheduler.java:completedContainer(1190)) - Null container
> completed...
> 2015-09-28 11:24:59,693 INFO  [Timer-3] scheduler.AbstractYarnScheduler
> (AbstractYarnScheduler.java:run(407)) - Release request cache is cleaned up
>
> -----Mensaje original-----
> De: Yi Pan [mailto:nickpa...@gmail.com]
> Enviado el: lunes, 28 de septiembre de 2015 10:37
> Para: dev@samza.apache.org
> Asunto: Re: process killing
>
> Hm... interesting. What did you see in the application master's logs? I
> saw that the remaining processes running are SamzaAppMasters.
>
> On Tue, Sep 22, 2015 at 1:05 AM, Jordi Blasi Uribarri <jbl...@nextel.es>
> wrote:
>
> > Hi,
> >
> > I have two machines running yarn and samza. They are samza 0.9.1 and
> > hadoop 2.6.0.
> >
> > I run the kill-all.sh I recently wrote and calls the kill-yarn-job.sh.
> > This is the output:
> >
> > java version "1.7.0_79"
> > OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-1~deb7u1)
> > OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/opt/jobs/lib/redirect-0.0.1.jar!/org/slf4j/impl/StaticLogge
> > rBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/opt/jobs/lib/samzafroga-0.0.1-jar-with-dependencies.jar!/or
> > g/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > java version "1.7.0_79"
> > OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-1~deb7u1)
> > OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java
> > -Dlog4j.configuration=file:bin/log4j-console.xml
> > -Dsamza.log.dir=/opt/jobs -Djava.io.tmpdir=/opt/jobs/tmp -Xmx768M
> > -XX:+PrintGCDateStamps -Xloggc:/opt/jobs/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/opt/jobs/lib/redirect-0.0.1.jar:/opt/jobs/lib/
> > samzafroga-0.0.1-jar-with-dependencies.jar
> > org.apache.hadoop.yarn.client.cli.ApplicationCLI application -kill
> > application_1442908447829_0001
> > 2015-09-22 10:02:46 RMProxy [INFO] Connecting to ResourceManager at
> > kfk-samza01/192.168.15.92:8032
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/opt/jobs/lib/redirect-0.0.1.jar!/org/slf4j/impl/StaticLogge
> > rBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/opt/jobs/lib/samzafroga-0.0.1-jar-with-dependencies.jar!/or
> > g/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > 2015-09-22 10:02:46 NativeCodeLoader [WARN] Unable to load
> > native-hadoop library for your platform... using builtin-java classes
> > where applicable Killing application application_1442908447829_0001
> > 2015-09-22 10:02:47 YarnClientImpl [INFO] Killed application
> > application_1442908447829_0001
> > java version "1.7.0_79"
> > OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-1~deb7u1)
> > OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java
> > -Dlog4j.configuration=file:bin/log4j-console.xml
> > -Dsamza.log.dir=/opt/jobs -Djava.io.tmpdir=/opt/jobs/tmp -Xmx768M
> > -XX:+PrintGCDateStamps -Xloggc:/opt/jobs/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/opt/jobs/lib/redirect-0.0.1.jar:/opt/jobs/lib/
> > samzafroga-0.0.1-jar-with-dependencies.jar
> > org.apache.hadoop.yarn.client.cli.ApplicationCLI application -kill
> > application_1442908447829_0002
> > 2015-09-22 10:02:49 RMProxy [INFO] Connecting to ResourceManager at
> > kfk-samza01/192.168.15.92:8032
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/opt/jobs/lib/redirect-0.0.1.jar!/org/slf4j/impl/StaticLogge
> > rBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/opt/jobs/lib/samzafroga-0.0.1-jar-with-dependencies.jar!/or
> > g/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > 2015-09-22 10:02:49 NativeCodeLoader [WARN] Unable to load
> > native-hadoop library for your platform... using builtin-java classes
> > where applicable Killing application application_1442908447829_0002
> > 2015-09-22 10:02:50 YarnClientImpl [INFO] Killed application
> > application_1442908447829_0002
> >
> > When I run ps -fe |grep java I see this:
> >
> > root      9542     1  3 09:54 pts/0    00:00:21
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_resourcemanager
> > -Xmx1000m -Dhadoop.log.dir=/opt/hadoop-2.6.0/logs
> > -Dyarn.log.dir=/opt/hadoop-2.6.0/logs
> > -Dhadoop.log.file=yarn-root-resourcemanager-kfk-samza01.log
> > -Dyarn.log.file=yarn-root-resourcemanager-kfk-samza01.log
> > -Dyarn.home.dir=/opt/hadoop-2.6.0 -Dhadoop.home.dir=/opt/hadoop-2.6.0
> > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA
> > -Djava.library.path=/opt/hadoop-2.6.0/lib/native -classpath
> > /opt/hadoop-2.6.0/conf:/opt/hadoop-2.6.0/conf:/opt/hadoop-2.6.0/conf:/
> > opt/hadoop-2.6.0/share/hadoop/common/lib/*:/opt/hadoop-2.6.0/share/had
> > oop/common/*:/opt/hadoop-2.6.0/share/hadoop/hdfs:/opt/hadoop-2.6.0/sha
> > re/hadoop/hdfs/lib/*:/opt/hadoop-2.6.0/share/hadoop/hdfs/*:/opt/hadoop
> > -2.6.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.6.0/share/hadoop/yarn/*:/
> > opt/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.6.0/share/
> > hadoop/mapreduce/*:/opt/hadoop-2.6.0/share/hadoop/yarn/*:/opt/hadoop-2
> > .6.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.6.0/conf/rm-config/log4j.pr
> > operties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> > root      9814     1  4 09:54 ?        00:00:24
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_nodemanager
> > -Xmx1000m -server -Dhadoop.log.dir=/opt/hadoop-2.6.0/logs
> > -Dyarn.log.dir=/opt/hadoop-2.6.0/logs
> > -Dhadoop.log.file=yarn-root-nodemanager-kfk-samza01.log
> > -Dyarn.log.file=yarn-root-nodemanager-kfk-samza01.log
> > -Dyarn.home.dir=/opt/hadoop-2.6.0 -Dhadoop.home.dir=/opt/hadoop-2.6.0
> > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA
> > -Djava.library.path=/opt/hadoop-2.6.0/lib/native -classpath
> > /opt/hadoop-2.6.0/conf:/opt/hadoop-2.6.0/conf:/opt/hadoop-2.6.0/conf:/
> > opt/hadoop-2.6.0/share/hadoop/common/lib/*:/opt/hadoop-2.6.0/share/had
> > oop/common/*:/opt/hadoop-2.6.0/share/hadoop/hdfs:/opt/hadoop-2.6.0/sha
> > re/hadoop/hdfs/lib/*:/opt/hadoop-2.6.0/share/hadoop/hdfs/*:/opt/hadoop
> > -2.6.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.6.0/share/hadoop/yarn/*:/
> > opt/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.6.0/share/
> > hadoop/mapreduce/*:/opt/hadoop-2.6.0/share/hadoop/yarn/*:/opt/hadoop-2
> > .6.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.6.0/conf/nm-config/log4j.pr
> > operties org.apache.hadoop.yarn.server.nodemanager.NodeManager
> > root     10271 10268  0 09:54 ?        00:00:05
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server
> > -Dsamza.container.name=samza-application-master
> > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447
> > 829_0002/container_1442908447829_0002_01_000001
> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache
> > /application_1442908447829_0002/container_1442908447829_0002_01_000001
> > /__package/tmp
> > -Xmx768M -XX:+PrintGCDateStamps
> > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0002
> > /container_1442908447829_0002_01_000001/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ap
> > pcache/application_1442908447829_0002/container_1442908447829_0002_01_
> > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm
> > -local-dir/usercache/root/appcache/application_1442908447829_0002/cont
> > ainer_1442908447829_0002_01_000001/__package/lib/jackson-core-2.6.0.ja
> > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_14
> > 42908447829_0002/container_1442908447829_0002_01_000001/__package/lib/
> > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/roo
> > t/appcache/application_1442908447829_0002/container_1442908447829_0002
> > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoo
> > p-root/nm-local-dir/usercache/root/appcache/application_1442908447829_
> > 0002/container_1442908447829_0002_01_000001/__package/lib/jackson-jaxr
> > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root
> > /appcache/application_1442908447829_0002/container_1442908447829_0002_
> > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp
> > /hadoop-root/nm-local-dir/usercache/root/appcache/application_14429084
> > 47829_0002/container_1442908447829_0002_01_000001/__package/lib/nxtBro
> > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ap
> > plication_1442908447829_0002/container_1442908447829_0002_01_000001/__
> > package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> > org.apache.samza.job.yarn.SamzaAppMaster
> > root     10346 10344  0 09:54 ?        00:00:04
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server
> > -Dsamza.container.name=samza-application-master
> > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447
> > 829_0001/container_1442908447829_0001_01_000001
> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache
> > /application_1442908447829_0001/container_1442908447829_0001_01_000001
> > /__package/tmp
> > -Xmx768M -XX:+PrintGCDateStamps
> > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001
> > /container_1442908447829_0001_01_000001/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ap
> > pcache/application_1442908447829_0001/container_1442908447829_0001_01_
> > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm
> > -local-dir/usercache/root/appcache/application_1442908447829_0001/cont
> > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.ja
> > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_14
> > 42908447829_0001/container_1442908447829_0001_01_000001/__package/lib/
> > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/roo
> > t/appcache/application_1442908447829_0001/container_1442908447829_0001
> > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoo
> > p-root/nm-local-dir/usercache/root/appcache/application_1442908447829_
> > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxr
> > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root
> > /appcache/application_1442908447829_0001/container_1442908447829_0001_
> > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp
> > /hadoop-root/nm-local-dir/usercache/root/appcache/application_14429084
> > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBro
> > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ap
> > plication_1442908447829_0001/container_1442908447829_0001_01_000001/__
> > package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> > org.apache.samza.job.yarn.SamzaAppMaster
> >
> > As you can see the proceses are still there. In the web application
> > the appear as KILLED.
> >
> > Thanks,
> >
> >    Jordi
> >
> > -----Mensaje original-----
> > De: Yan Fang [mailto:yanfang...@gmail.com] Enviado el: martes, 22 de
> > septiembre de 2015 9:59
> > Para: dev@samza.apache.org
> > Asunto: Re: process killing
> >
> > Hi Jordi,
> >
> > 1. Are you running the job in one machine yarn? or in a cluster?
> >
> > 2. what kind of the java process do you see after killing the yarn
> > application? Because usually, when we run kill-yarn-job applicationId,
> > we do kill all the processes (this is actually done by the Yarn).
> >
> > 3. Which version of Samza and Yarn are you using ? This matters
> sometimes.
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang...@gmail.com
> >
> > On Tue, Sep 22, 2015 at 3:42 PM, Jordi Blasi Uribarri
> > <jbl...@nextel.es>
> > wrote:
> >
> > > Hi,
> > >
> > > I am currently developing solution using samza and in the
> > > development process I need to constantly change the code and test in
> the system.
> > > What I am seeing is that most of the times I kill a job using the
> > > kill-yarn-job script the job gets killed according to the web
> > > interface but I see the java process running. I also have seen that
> > > the job was actually been executed, as I got messages in the far end
> > > of the application. I have been manually killing these processes
> > > (kill
> > -9 ) but I have some questions:
> > >
> > >
> > > -          Is there a reason for the processes not to be killed. It was
> > > not a matter of time as I could find them hours later.
> > >
> > > -          I don’t know if there should be any other action performed
> to
> > > completely clean the information or killing the process the hard way
> > > is enough.
> > >
> > > -          I am finding some memory consumption problems that I don’t
> > know
> > > if they are related with this. Maybe I will describe them in another
> > > message.
> > >
> > > Thnaks,
> > >
> > >   Jordi
> > > ________________________________
> > > Jordi Blasi Uribarri
> > > Área I+D+i
> > >
> > > jbl...@nextel.es
> > > Oficina Bilbao
> > >
> > > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> > >
> >
>

Reply via email to