Gour Saha created SLIDER-438:
--------------------------------

             Summary: Slider agent continues to run in the container on a node 
where NM dies
                 Key: SLIDER-438
                 URL: https://issues.apache.org/jira/browse/SLIDER-438
             Project: Slider
          Issue Type: Bug
          Components: agent, agent-provider
            Reporter: Gour Saha


Steps to reproduce:
- Setup a 3-node cluster (in non-HA mode)
- Run slider create for HBase app-package (with HMaster and HRegionServer 
components only - just to keep things simple)
- Let's assume that the HRegionServer came up in a node different from that of 
HMaster and Slider AM (if not, doing destroy-create couple of times will 
definitely get you to this setup)
- Kill the NM in the node where HRegionServer is running
- Restart the NM within 10 minutes (which is the default time after which RM 
marks the node as KILLED, configurable using 
yarn.nm.liveness-monitor.expiry-interval-ms)
- At this point Slider AM received the container lost event from RM, it marked 
the container lost and requested a new one to RM. A new HRegionServer container 
came up (in the same host where the old one was running). At this point both 
the HRegionServer containers continued to run happily along side each other and 
successfully heart-beating to AM.

Expected:
- Given that the first HRegionServer instance was still heart-beating with AM, 
AM should be able to send a kill signal and bring the agent/container down.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to