It seems I didn't use the new one after compilation. Forget my question.

I am trying the new one.

On 10/28/2014 02:40 PM, Rui Zhang wrote:
Made the fix but still cannot make it.
Actually, the steps to reproduce in SLIDER-439 is different from mine.
What I do is first use "freeze" command and then kill one node manager. Wait long enough for the node manager leave the Yarn cluster. And then use "thaw" command to restart. However, the instance that was running on that killed node is not able to restart.

Here is part of the log.

14/10/28 18:25:42 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 14/10/28 18:25:42 INFO zookeeper.ClientCnxn: Session establishment complete on server vertica1/172.17.42.1:16433, sessionid = 0x14957f07d6f011f, negotiated timeout = 40000 14/10/28 18:25:42 INFO state.ConnectionStateManager: State change: CONNECTED
14/10/28 18:25:42 INFO mortbay.log: jetty-6.1.26
Oct 28, 2014 6:25:42 PM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
  org.apache.slider.server.appmaster.web.rest.agent
Oct 28, 2014 6:25:42 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class org.apache.slider.server.appmaster.web.rest.agent.AgentWebServices Oct 28, 2014 6:25:42 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
Oct 28, 2014 6:25:42 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' 14/10/28 18:25:43 INFO mortbay.log: Started [email protected]:46561 14/10/28 18:25:43 INFO mortbay.log: Started [email protected]:36451 14/10/28 18:25:43 INFO http.HttpRequestLog: Http request log for http.requests.slideram is not defined 14/10/28 18:25:43 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 14/10/28 18:25:43 INFO http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.slider.server.appmaster.web.SliderAmIpFilter) to context slideram 14/10/28 18:25:43 INFO http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.slider.server.appmaster.web.SliderAmIpFilter) to context static
14/10/28 18:25:43 INFO http.HttpServer2: adding path spec: /slideram/*
14/10/28 18:25:43 INFO http.HttpServer2: adding path spec: /ws/*
14/10/28 18:25:43 INFO http.HttpServer2: Jetty bound to port 47481
14/10/28 18:25:43 INFO mortbay.log: jetty-6.1.26
14/10/28 18:25:43 INFO mortbay.log: Extract jar:file:/home/rzhang/Slider_Vertica/Linux64/Test/verticadb1000/HDP2_1/hadoop/local/usercache/rzhang/appcache/application_1414519516219_0002/filecache/18/slider.jar!/webapps/slideram to /tmp/Jetty_0_0_0_0_47481_slideram____.7p4s4g/webapp 14/10/28 18:25:43 INFO mortbay.log: Started [email protected]:47481
14/10/28 18:25:43 INFO webapp.WebApps: Web app /slideram started at 47481
14/10/28 18:25:43 INFO webapp.WebApps: Registered webapp guice modules
14/10/28 18:25:43 INFO appmaster.SliderAppMaster: Connecting to RM at 46522,address tracking URL=http://vertica2.rzhang.com:47481 14/10/28 18:25:43 INFO agent.AgentUtils: Reading metainfo at hdfs://rzhang-HP-ZBook-15:10433/slider/slider_test.zip 14/10/28 18:25:44 INFO tools.SliderUtils: Reading metainfo.xml of size 3193 14/10/28 18:25:44 INFO agent.HeartbeatMonitor: Starting heartbeat monitor with interval 60000
14/10/28 18:25:44 INFO state.AppState: Adding new role VERTICA_MASTER
14/10/28 18:25:44 INFO state.AppState: Role VERTICA_MASTER assigned priority 1
14/10/28 18:25:44 INFO state.AppState: Adding new role VERTICA_SLAVE
14/10/28 18:25:44 INFO state.AppState: Role VERTICA_SLAVE assigned priority 2 14/10/28 18:25:44 INFO state.AppState: Role slider-appmaster flexed from 0 to 1 14/10/28 18:25:44 INFO state.AppState: Role VERTICA_SLAVE flexed from 0 to 2 14/10/28 18:25:44 INFO state.AppState: Role VERTICA_MASTER flexed from 0 to 1 14/10/28 18:25:44 INFO state.RoleHistory: loaded history from hdfs://rzhang-HP-ZBook-15:10433/user/rzhang/.slider/cluster/slider_test/history/rolehistory-0000014957f14d86.json 14/10/28 18:25:44 INFO appmaster.SliderAppMaster: service instances already running: [] 14/10/28 18:25:44 INFO curator.RegistryBinderService: registering ServiceInstance{name='org-apache-slider', id='slider_test', address='172.17.0.3', port=47481, sslPort=null, payload=ServiceInstanceData{id='slider_test', serviceType='org-apache-slider'}, registrationTimeUTC=1414520744939, serviceType=DYNAMIC, uriSpec=org.apache.curator.x.discovery.UriSpec@54515c2} 14/10/28 18:25:45 INFO curator.RegistryBinderService: registration completed ServiceInstance{name='org-apache-slider', id='slider_test', address='172.17.0.3', port=47481, sslPort=null, payload=ServiceInstanceData{id='slider_test', serviceType='org-apache-slider'}, registrationTimeUTC=1414520744939, serviceType=DYNAMIC, uriSpec=org.apache.curator.x.discovery.UriSpec@54515c2}
14/10/28 18:25:45 INFO appmaster.SliderAppMaster: Chaos monkey disabled
14/10/28 18:25:45 INFO appmaster.SliderAppMaster: Adding Chaos Monkey scheduled every 0 seconds (0 hours) 14/10/28 18:25:45 INFO workflow.WorkflowCompositeService: Child service completed Service SliderAMProviderService in state SliderAMProviderService: STOPPED; current service null; queued service count=0 14/10/28 18:25:45 INFO appmaster.SliderAppMaster: Process has exited with exit code 0 mapped to 0 -ignoring 14/10/28 18:25:45 INFO state.AppState: RoleStatus{name='VERTICA_SLAVE', key=2, desired=2, actual=0, requested=0, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=''} 14/10/28 18:25:45 INFO state.AppState: VERTICA_SLAVE: Asking for 2 more nodes(s) for a total of 2 14/10/28 18:25:45 INFO state.RoleHistory: There're 2 nodes to consider for VERTICA_SLAVE 14/10/28 18:25:45 INFO state.OutstandingRequest: Submitting request for container on vertica2.rzhang.com 14/10/28 18:25:45 INFO state.AppState: Container ask is Capability[<memory:1024, vCores:1>]Priority[2] 14/10/28 18:25:45 INFO state.RoleHistory: There're 1 nodes to consider for VERTICA_SLAVE 14/10/28 18:25:45 INFO state.OutstandingRequest: Submitting request for container on vertica0.rzhang.com 14/10/28 18:25:45 INFO state.AppState: Container ask is Capability[<memory:1024, vCores:1>]Priority[2] 14/10/28 18:25:45 INFO state.AppState: RoleStatus{name='VERTICA_MASTER', key=1, desired=1, actual=0, requested=0, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=''} 14/10/28 18:25:45 INFO state.AppState: VERTICA_MASTER: Asking for 1 more nodes(s) for a total of 1 14/10/28 18:25:45 INFO state.RoleHistory: There're 1 nodes to consider for VERTICA_MASTER 14/10/28 18:25:45 INFO state.OutstandingRequest: Submitting request for container on vertica1 14/10/28 18:25:45 INFO state.AppState: Container ask is Capability[<memory:1024, vCores:1>]Priority[1] 14/10/28 18:25:45 INFO util.RackResolver: Resolved vertica2.rzhang.com to /default-rack 14/10/28 18:25:45 INFO util.RackResolver: Resolved vertica0.rzhang.com to /default-rack 14/10/28 18:25:45 INFO util.RackResolver: Resolved vertica1 to /default-rack 14/10/28 18:25:46 INFO impl.AMRMClientImpl: Received new token for : vertica0.rzhang.com:54106 14/10/28 18:25:46 INFO impl.AMRMClientImpl: Received new token for : vertica2.rzhang.com:41175 14/10/28 18:25:46 INFO appmaster.SliderAppMaster: onContainersAllocated(2) 14/10/28 18:25:46 INFO state.AppState: Assigning role VERTICA_SLAVE to container container_1414519516219_0002_01_000002, on vertica0.rzhang.com:54106, 14/10/28 18:25:46 INFO state.AppState: Assigning role VERTICA_SLAVE to container container_1414519516219_0002_01_000003, on vertica2.rzhang.com:41175, 14/10/28 18:25:46 INFO appmaster.SliderAppMaster: Diagnostics: RoleStatus{name='slider-appmaster', key=0, desired=1, actual=0, requested=0, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=''} RoleStatus{name='VERTICA_SLAVE', key=2, desired=2, actual=2, requested=0, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=''} RoleStatus{name='VERTICA_MASTER', key=1, desired=1, actual=0, requested=1, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=''}

14/10/28 18:25:46 INFO agent.AgentProviderService: Build launch context for Agent 14/10/28 18:25:46 INFO agent.AgentProviderService: Build launch context for Agent 14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_WORK_ROOT set to $PWD 14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_LOG_ROOT set to $LOG_DIRS 14/10/28 18:25:46 INFO agent.AgentProviderService: PYTHONPATH set to ./infra/agent/slider-agent/ 14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_WORK_ROOT set to $PWD 14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_LOG_ROOT set to $LOG_DIRS 14/10/28 18:25:46 INFO agent.AgentProviderService: PYTHONPATH set to ./infra/agent/slider-agent/ 14/10/28 18:25:46 INFO agent.AgentProviderService: Using ./infra/agent/slider-agent/agent/main.py for agent. 14/10/28 18:25:46 INFO agent.AgentProviderService: Using ./infra/agent/slider-agent/agent/main.py for agent. 14/10/28 18:25:46 INFO appmaster.RoleLaunchService: Starting container with command: python ./infra/agent/slider-agent/agent/main.py --label container_1414519516219_0002_01_000002___VERTICA_SLAVE --zk-quorum rzhang-HP-ZBook-15:16433 --zk-reg-path /registry/org-apache-slider/slider_test ; 14/10/28 18:25:46 INFO appmaster.RoleLaunchService: Starting container with command: python ./infra/agent/slider-agent/agent/main.py --label container_1414519516219_0002_01_000003___VERTICA_SLAVE --zk-quorum rzhang-HP-ZBook-15:16433 --zk-reg-path /registry/org-apache-slider/slider_test ; 14/10/28 18:25:46 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1414519516219_0002_01_000002 14/10/28 18:25:46 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1414519516219_0002_01_000003 14/10/28 18:25:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : vertica0.rzhang.com:54106 14/10/28 18:25:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : vertica2.rzhang.com:41175 14/10/28 18:25:46 INFO appmaster.SliderAppMaster: Started Container container_1414519516219_0002_01_000002 14/10/28 18:25:46 INFO appmaster.SliderAppMaster: Started Container container_1414519516219_0002_01_000003 14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Deployed instance of role VERTICA_SLAVE onto container_1414519516219_0002_01_000002 14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Registering component container_1414519516219_0002_01_000002 14/10/28 18:25:47 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1414519516219_0002_01_000002 14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Deployed instance of role VERTICA_SLAVE onto container_1414519516219_0002_01_000003 14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Registering component container_1414519516219_0002_01_000003 14/10/28 18:25:47 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1414519516219_0002_01_000003

Thanks,
Rui



On 10/28/2014 01:47 PM, Sumit Mohanty wrote:
There is a bug fix that went in few days back -
https://issues.apache.org/jira/browse/SLIDER-439 - that specifically fixed
this issue.

thanks
-Sumit

On Tue, Oct 28, 2014 at 10:36 AM, Rui Zhang <[email protected]> wrote:

Hi,

When I killed a node manager manually and restart the application, it
seems that an instance previously ran on that node manager is not able to
restart. Why is this?  I think Yarn should allocate a container on a
different machine for this instance, right?

Thanks,
Rui

--
Rui Zhang
Software engineer Intern
Vertica, an HP Company
[email protected]




--
Rui Zhang
Software engineer Intern
Vertica, an HP Company
[email protected]

Reply via email to