Hello All:
I was able to get this working after some code diving, log reading, rtfm, etc.
The following issues came up, I'm not sure if the lessons learned for me are of
general interest but here they are:
1. Slider needs a zookeeper instance for the yarn registry service to work.
I haven't determined which parameters are optional, but setting the
hadoop.registry.zk.quorum property was definitely needed in the commmand line
argument is:
* slider create jmemcached --template
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/appConfig.json
--resources
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/resources-default.json
--manager yarn-rm.cluster.mycompany.com:8032 --debug --zkhosts
zknode.zkcluster.mycompany.com:2181 --zkpath /slider_test/clustername/ -D
hadoop.registry.zk.quorum=zknode.zkcluster.mycompany.com:2181 -D
yarn.nodemanager.delete.debug-delay-sec=3600 -D
yarn.nodemanager.sleep-delay-before-sigkill.ms=3600000
2. Slider can be installed in a user directory for testing, we don't need
root permissions (great if you need to avoid pestering your ops team) and run
as a user
3. If the launched application (e.g. memcached in this case) fails quickly,
it can be really hard to view the memcached container's logs. I had to try
many times before I was quick enough to view the logs, initially I was confused
and thought the registration had failed. In case you are wondering why it
failed, the java_home setting for the docker image was not consistent with our
actual cluster. The yarn node manager settings didn't seem to help that much
with this, are there any better hints for reading the logs of failed containers?
4. It would be nice if the Container statistics page showed both the
container and node that is running it (although the link gives a hint, it would
be nice to see it).
5. I had to manually remove the hdfs installed files during testing to
allow a clean test shot.
6. The log4j.properties and log4j-server.properties in the installation
slider/conf directory are useful. Since I'm running out of my home directory I
was able to edit them, but if the rpm install was used, normal users might need
to escalate privileges to edit them. I'm testing a version of the client that
lets use replace the client log4j.properties with a user defined one (for
testing).
With best regards:
Bill
________________________________
From: Foolish Ewe <[email protected]>
Sent: Wednesday, May 3, 2017 1:47 AM
To: [email protected]
Subject: How to port a working slider and memcached example from docker image
to a cluster?
I have a version of the memcached example running on a docker image, and now
I'd like to port that to a real cluster (to get a working starting point for
the actual service I want to run in slider).
I suspect the configuration issues could be in the zoo keeper or yarn service
registry configuration.
Running the following (sanitized) commands:
slider install-package --package
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip
--name jmemcached --debug --replacepkg
slider create jmemcached --template
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/appConfig.json
--resources
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/resources-default.json
--manager rm.yarn.cluster.mycompany.com:8032 --debug --zkhosts
zookeeper.cluster.mycompany.com:2181 --zkpath /slider_test/clustername/
I'm seeing failed zookeeper connections to localhost:2181 the AM logs:
2017-05-02 16:16:07,992 [main-SendThread(localhost:2181)] WARN
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing
socket connection and attempting reconnect
java.net.ConnectException: Connection refused
How can I tweak the connection string?
If I look at slider/conf/slider-client.xml, I am still using the default
configuration and see the following setting:
<property>
<name>hadoop.registry.zk.quorum</name>
<value>@ZK-QUORUM</value>
</property>
First off, I'm not sure about the @ZK-QUORUM syntax means, overriding this with
with connection string with a single host provides no relief from the dreaded
symptom.
The AM logs look like:
2017-05-02 16:16:07,401 [main] INFO appmaster.SliderAppMaster - Registry
service username =fooolish_ewe
2017-05-02 16:16:07,462 [main] INFO appmaster.SliderAppMaster - Service Record
ServiceRecord{description='Slider Application Master'; external endpoints: {{
"api" : "http://",
"addressType" : "uri",
"protocolType" : "webui",
"addresses" : [ {
"uri" : "http://cluster.mycompany.com:42734"
} ]
}; {
"api" : "classpath:org.apache.slider.management",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/mgmt"
} ]
}; {
"api" : "classpath:org.apache.slider.publisher",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher"
} ]
}; {
"api" : "classpath:org.apache.slider.registry",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/registry"
} ]
}; {
"api" : "classpath:org.apache.slider.publisher.configurations",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher/slider"
} ]
}; {
"api" : "classpath:org.apache.slider.publisher.exports",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher/exports"
} ]
}; }; internal endpoints: {{
"api" : "classpath:org.apache.slider.agents.secure",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "https://cluster.mycompany.com:40466/ws/v1/slider/agents"
} ]
}; {
"api" : "classpath:org.apache.slider.agents.oneway",
"addressType" : "uri",
"protocolType" : "REST",
"addresses" : [ {
"uri" : "https://cluster.mycompany.com:59141/ws/v1/slider/agents"
} ]
}; }, attributes: {"yarn:id"="application_1492599342357_0064"
"yarn:persistence"="application" }}
2017-05-02 16:16:07,992 [main-SendThread(localhost:2181)] WARN
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing
socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[Several repetitions of the previous error omitted for clarity and then...]
2017-05-02 16:16:12,877 [780172372@qtp-747004588-0] ERROR webapp.Dispatcher -
error handling URI: /slideram
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1286)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at
org.apache.slider.providers.AbstractProviderService.buildEndpointDetails(AbstractProviderService.java:352)
at
org.apache.slider.providers.AbstractProviderService.buildMonitorDetails(AbstractProviderService.java:337)
at
org.apache.slider.providers.agent.AgentProviderService.buildMonitorDetails(AgentProviderService.java:810)
at
org.apache.slider.server.appmaster.web.view.IndexBlock.addProviderServiceOptions(IndexBlock.java:129)
at
org.apache.slider.server.appmaster.web.view.IndexBlock.doIndex(IndexBlock.java:85)
at
org.apache.slider.server.appmaster.web.view.IndexBlock.render(IndexBlock.java:60)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
at
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at
org.apache.slider.server.appmaster.web.SliderAMController.index(SliderAMController.java:47)
... 39 more
2017-05-02 16:16:13,495 [main-SendThread(localhost:2181)] WARN
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing
socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[More repetitions of the previous error deleted]
2017-05-02 16:16:22,474 [main] ERROR curator.ConnectionState - Connection timed
out for connection string (localhost:2181) and timeout (15000) / elapsed (18944)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:457)
at
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)
at
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at
org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
at
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)
at
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)
at
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkDelete(CuratorService.java:673)
at
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.delete(RegistryOperationsService.java:160)
at
org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:186)
at
org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:224)
at
org.apache.slider.server.appmaster.SliderAppMaster.registerServiceInstance(SliderAppMaster.java:1084)
at
org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:885)
at
org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:525)
at
org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
at
org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
at
org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
at
org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)
at
org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:2240)
2017-05-02 16:16:23,403 [main-SendThread(localhost:2181)] WARN
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing
socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-05-02 16:16:24,504 [main-SendThread(localhost:2181)] WARN
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing
socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
With best regards:
Bill