It seems I didn't use the new one after compilation. Forget my question.
I am trying the new one.
On 10/28/2014 02:40 PM, Rui Zhang wrote:
Made the fix but still cannot make it.
Actually, the steps to reproduce in SLIDER-439 is different from mine.
What I do is first use "freeze" command and then kill one node
manager. Wait long enough for the node manager leave the Yarn cluster.
And then use "thaw" command to restart.
However, the instance that was running on that killed node is not able
to restart.
Here is part of the log.
14/10/28 18:25:42 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
14/10/28 18:25:42 INFO zookeeper.ClientCnxn: Session establishment
complete on server vertica1/172.17.42.1:16433, sessionid =
0x14957f07d6f011f, negotiated timeout = 40000
14/10/28 18:25:42 INFO state.ConnectionStateManager: State change:
CONNECTED
14/10/28 18:25:42 INFO mortbay.log: jetty-6.1.26
Oct 28, 2014 6:25:42 PM com.sun.jersey.api.core.PackagesResourceConfig
init
INFO: Scanning for root resource and provider classes in the packages:
org.apache.slider.server.appmaster.web.rest.agent
Oct 28, 2014 6:25:42 PM com.sun.jersey.api.core.ScanningResourceConfig
logClasses
INFO: Root resource classes found:
class
org.apache.slider.server.appmaster.web.rest.agent.AgentWebServices
Oct 28, 2014 6:25:42 PM com.sun.jersey.api.core.ScanningResourceConfig
init
INFO: No provider classes found.
Oct 28, 2014 6:25:42 PM
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011
11:17 AM'
14/10/28 18:25:43 INFO mortbay.log: Started
[email protected]:46561
14/10/28 18:25:43 INFO mortbay.log: Started
[email protected]:36451
14/10/28 18:25:43 INFO http.HttpRequestLog: Http request log for
http.requests.slideram is not defined
14/10/28 18:25:43 INFO http.HttpServer2: Added global filter 'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
14/10/28 18:25:43 INFO http.HttpServer2: Added filter AM_PROXY_FILTER
(class=org.apache.slider.server.appmaster.web.SliderAmIpFilter) to
context slideram
14/10/28 18:25:43 INFO http.HttpServer2: Added filter AM_PROXY_FILTER
(class=org.apache.slider.server.appmaster.web.SliderAmIpFilter) to
context static
14/10/28 18:25:43 INFO http.HttpServer2: adding path spec: /slideram/*
14/10/28 18:25:43 INFO http.HttpServer2: adding path spec: /ws/*
14/10/28 18:25:43 INFO http.HttpServer2: Jetty bound to port 47481
14/10/28 18:25:43 INFO mortbay.log: jetty-6.1.26
14/10/28 18:25:43 INFO mortbay.log: Extract
jar:file:/home/rzhang/Slider_Vertica/Linux64/Test/verticadb1000/HDP2_1/hadoop/local/usercache/rzhang/appcache/application_1414519516219_0002/filecache/18/slider.jar!/webapps/slideram
to /tmp/Jetty_0_0_0_0_47481_slideram____.7p4s4g/webapp
14/10/28 18:25:43 INFO mortbay.log: Started
[email protected]:47481
14/10/28 18:25:43 INFO webapp.WebApps: Web app /slideram started at 47481
14/10/28 18:25:43 INFO webapp.WebApps: Registered webapp guice modules
14/10/28 18:25:43 INFO appmaster.SliderAppMaster: Connecting to RM at
46522,address tracking URL=http://vertica2.rzhang.com:47481
14/10/28 18:25:43 INFO agent.AgentUtils: Reading metainfo at
hdfs://rzhang-HP-ZBook-15:10433/slider/slider_test.zip
14/10/28 18:25:44 INFO tools.SliderUtils: Reading metainfo.xml of size
3193
14/10/28 18:25:44 INFO agent.HeartbeatMonitor: Starting heartbeat
monitor with interval 60000
14/10/28 18:25:44 INFO state.AppState: Adding new role VERTICA_MASTER
14/10/28 18:25:44 INFO state.AppState: Role VERTICA_MASTER assigned
priority 1
14/10/28 18:25:44 INFO state.AppState: Adding new role VERTICA_SLAVE
14/10/28 18:25:44 INFO state.AppState: Role VERTICA_SLAVE assigned
priority 2
14/10/28 18:25:44 INFO state.AppState: Role slider-appmaster flexed
from 0 to 1
14/10/28 18:25:44 INFO state.AppState: Role VERTICA_SLAVE flexed from
0 to 2
14/10/28 18:25:44 INFO state.AppState: Role VERTICA_MASTER flexed from
0 to 1
14/10/28 18:25:44 INFO state.RoleHistory: loaded history from
hdfs://rzhang-HP-ZBook-15:10433/user/rzhang/.slider/cluster/slider_test/history/rolehistory-0000014957f14d86.json
14/10/28 18:25:44 INFO appmaster.SliderAppMaster: service instances
already running: []
14/10/28 18:25:44 INFO curator.RegistryBinderService: registering
ServiceInstance{name='org-apache-slider', id='slider_test',
address='172.17.0.3', port=47481, sslPort=null,
payload=ServiceInstanceData{id='slider_test',
serviceType='org-apache-slider'}, registrationTimeUTC=1414520744939,
serviceType=DYNAMIC,
uriSpec=org.apache.curator.x.discovery.UriSpec@54515c2}
14/10/28 18:25:45 INFO curator.RegistryBinderService: registration
completed ServiceInstance{name='org-apache-slider', id='slider_test',
address='172.17.0.3', port=47481, sslPort=null,
payload=ServiceInstanceData{id='slider_test',
serviceType='org-apache-slider'}, registrationTimeUTC=1414520744939,
serviceType=DYNAMIC,
uriSpec=org.apache.curator.x.discovery.UriSpec@54515c2}
14/10/28 18:25:45 INFO appmaster.SliderAppMaster: Chaos monkey disabled
14/10/28 18:25:45 INFO appmaster.SliderAppMaster: Adding Chaos Monkey
scheduled every 0 seconds (0 hours)
14/10/28 18:25:45 INFO workflow.WorkflowCompositeService: Child
service completed Service SliderAMProviderService in state
SliderAMProviderService: STOPPED; current service null; queued service
count=0
14/10/28 18:25:45 INFO appmaster.SliderAppMaster: Process has exited
with exit code 0 mapped to 0 -ignoring
14/10/28 18:25:45 INFO state.AppState:
RoleStatus{name='VERTICA_SLAVE', key=2, desired=2, actual=0,
requested=0, releasing=0, failed=0, started=0, startFailed=0,
completed=0, failureMessage=''}
14/10/28 18:25:45 INFO state.AppState: VERTICA_SLAVE: Asking for 2
more nodes(s) for a total of 2
14/10/28 18:25:45 INFO state.RoleHistory: There're 2 nodes to consider
for VERTICA_SLAVE
14/10/28 18:25:45 INFO state.OutstandingRequest: Submitting request
for container on vertica2.rzhang.com
14/10/28 18:25:45 INFO state.AppState: Container ask is
Capability[<memory:1024, vCores:1>]Priority[2]
14/10/28 18:25:45 INFO state.RoleHistory: There're 1 nodes to consider
for VERTICA_SLAVE
14/10/28 18:25:45 INFO state.OutstandingRequest: Submitting request
for container on vertica0.rzhang.com
14/10/28 18:25:45 INFO state.AppState: Container ask is
Capability[<memory:1024, vCores:1>]Priority[2]
14/10/28 18:25:45 INFO state.AppState:
RoleStatus{name='VERTICA_MASTER', key=1, desired=1, actual=0,
requested=0, releasing=0, failed=0, started=0, startFailed=0,
completed=0, failureMessage=''}
14/10/28 18:25:45 INFO state.AppState: VERTICA_MASTER: Asking for 1
more nodes(s) for a total of 1
14/10/28 18:25:45 INFO state.RoleHistory: There're 1 nodes to consider
for VERTICA_MASTER
14/10/28 18:25:45 INFO state.OutstandingRequest: Submitting request
for container on vertica1
14/10/28 18:25:45 INFO state.AppState: Container ask is
Capability[<memory:1024, vCores:1>]Priority[1]
14/10/28 18:25:45 INFO util.RackResolver: Resolved vertica2.rzhang.com
to /default-rack
14/10/28 18:25:45 INFO util.RackResolver: Resolved vertica0.rzhang.com
to /default-rack
14/10/28 18:25:45 INFO util.RackResolver: Resolved vertica1 to
/default-rack
14/10/28 18:25:46 INFO impl.AMRMClientImpl: Received new token for :
vertica0.rzhang.com:54106
14/10/28 18:25:46 INFO impl.AMRMClientImpl: Received new token for :
vertica2.rzhang.com:41175
14/10/28 18:25:46 INFO appmaster.SliderAppMaster:
onContainersAllocated(2)
14/10/28 18:25:46 INFO state.AppState: Assigning role VERTICA_SLAVE to
container container_1414519516219_0002_01_000002, on
vertica0.rzhang.com:54106,
14/10/28 18:25:46 INFO state.AppState: Assigning role VERTICA_SLAVE to
container container_1414519516219_0002_01_000003, on
vertica2.rzhang.com:41175,
14/10/28 18:25:46 INFO appmaster.SliderAppMaster: Diagnostics:
RoleStatus{name='slider-appmaster', key=0, desired=1, actual=0,
requested=0, releasing=0, failed=0, started=0, startFailed=0,
completed=0, failureMessage=''}
RoleStatus{name='VERTICA_SLAVE', key=2, desired=2, actual=2,
requested=0, releasing=0, failed=0, started=0, startFailed=0,
completed=0, failureMessage=''}
RoleStatus{name='VERTICA_MASTER', key=1, desired=1, actual=0,
requested=1, releasing=0, failed=0, started=0, startFailed=0,
completed=0, failureMessage=''}
14/10/28 18:25:46 INFO agent.AgentProviderService: Build launch
context for Agent
14/10/28 18:25:46 INFO agent.AgentProviderService: Build launch
context for Agent
14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_WORK_ROOT set
to $PWD
14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_LOG_ROOT set
to $LOG_DIRS
14/10/28 18:25:46 INFO agent.AgentProviderService: PYTHONPATH set to
./infra/agent/slider-agent/
14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_WORK_ROOT set
to $PWD
14/10/28 18:25:46 INFO agent.AgentProviderService: AGENT_LOG_ROOT set
to $LOG_DIRS
14/10/28 18:25:46 INFO agent.AgentProviderService: PYTHONPATH set to
./infra/agent/slider-agent/
14/10/28 18:25:46 INFO agent.AgentProviderService: Using
./infra/agent/slider-agent/agent/main.py for agent.
14/10/28 18:25:46 INFO agent.AgentProviderService: Using
./infra/agent/slider-agent/agent/main.py for agent.
14/10/28 18:25:46 INFO appmaster.RoleLaunchService: Starting container
with command: python ./infra/agent/slider-agent/agent/main.py --label
container_1414519516219_0002_01_000002___VERTICA_SLAVE --zk-quorum
rzhang-HP-ZBook-15:16433 --zk-reg-path
/registry/org-apache-slider/slider_test ;
14/10/28 18:25:46 INFO appmaster.RoleLaunchService: Starting container
with command: python ./infra/agent/slider-agent/agent/main.py --label
container_1414519516219_0002_01_000003___VERTICA_SLAVE --zk-quorum
rzhang-HP-ZBook-15:16433 --zk-reg-path
/registry/org-apache-slider/slider_test ;
14/10/28 18:25:46 INFO impl.NMClientAsyncImpl: Processing Event
EventType: START_CONTAINER for Container
container_1414519516219_0002_01_000002
14/10/28 18:25:46 INFO impl.NMClientAsyncImpl: Processing Event
EventType: START_CONTAINER for Container
container_1414519516219_0002_01_000003
14/10/28 18:25:46 INFO impl.ContainerManagementProtocolProxy: Opening
proxy : vertica0.rzhang.com:54106
14/10/28 18:25:46 INFO impl.ContainerManagementProtocolProxy: Opening
proxy : vertica2.rzhang.com:41175
14/10/28 18:25:46 INFO appmaster.SliderAppMaster: Started Container
container_1414519516219_0002_01_000002
14/10/28 18:25:46 INFO appmaster.SliderAppMaster: Started Container
container_1414519516219_0002_01_000003
14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Deployed instance of
role VERTICA_SLAVE onto container_1414519516219_0002_01_000002
14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Registering
component container_1414519516219_0002_01_000002
14/10/28 18:25:47 INFO impl.NMClientAsyncImpl: Processing Event
EventType: QUERY_CONTAINER for Container
container_1414519516219_0002_01_000002
14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Deployed instance of
role VERTICA_SLAVE onto container_1414519516219_0002_01_000003
14/10/28 18:25:47 INFO appmaster.SliderAppMaster: Registering
component container_1414519516219_0002_01_000003
14/10/28 18:25:47 INFO impl.NMClientAsyncImpl: Processing Event
EventType: QUERY_CONTAINER for Container
container_1414519516219_0002_01_000003
Thanks,
Rui
On 10/28/2014 01:47 PM, Sumit Mohanty wrote:
There is a bug fix that went in few days back -
https://issues.apache.org/jira/browse/SLIDER-439 - that specifically
fixed
this issue.
thanks
-Sumit
On Tue, Oct 28, 2014 at 10:36 AM, Rui Zhang <[email protected]> wrote:
Hi,
When I killed a node manager manually and restart the application, it
seems that an instance previously ran on that node manager is not
able to
restart. Why is this? I think Yarn should allocate a container on a
different machine for this instance, right?
Thanks,
Rui
--
Rui Zhang
Software engineer Intern
Vertica, an HP Company
[email protected]
--
Rui Zhang
Software engineer Intern
Vertica, an HP Company
[email protected]