To run an Accumulo instance on one of these VMs successfully, I've been having to up Accumulo's zk timeout from the normal 30s to 180s (this is with HDFS, ZK, and Accumulo all running on the same VM). So you could just be having an issue with resources. We could consider increasing or making configurable the zk timeout or number of retries that Slider uses for various zk operations.
On Wed, Feb 25, 2015 at 6:53 AM, Jon Maron <[email protected]> wrote: > I’ve noticed that I’m having intermittent issues accessing the zookeeper > quorum during “destroy” attempts: > > 2015-02-25 09:48:02,345 [main] WARN client.SliderClient > (SliderClient.java:getZkClient(523)) - Unable to connect to zookeeper > quorum c6402.ambari.apache.org:2181,c6404.ambari.apache.org:2181, > c6403.ambari.apache.org:2181,c6405.ambari.apache.org:2181 > java.net.ConnectException: Unable to connect to ZK quorum > at > org.apache.slider.core.zk.BlockingZKWatcher.waitForZKConnection(BlockingZKWatcher.java:63) > at > org.apache.slider.client.SliderClient.getZkClient(SliderClient.java:518) > at > org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.java:458) > at > org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550) > at > org.apache.slider.client.SliderClient.exec(SliderClient.java:383) > at > org.apache.slider.client.SliderClient.runService(SliderClient.java:348) > at > org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403) > at > org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630) > at org.apache.slider.Slider.main(Slider.java:49) > 2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient > (SliderClient.java:deleteZookeeperNode(474)) - Unable to recursively delete > zk node /services/slider/users/jmaron/hbase-test > 2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient > (SliderClient.java:deleteZookeeperNode(475)) - Reason: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /services/slider/users/jmaron/hbase-test > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) > at > org.apache.slider.core.zk.ZKIntegration.stat(ZKIntegration.java:164) > at > org.apache.slider.core.zk.ZKIntegration.exists(ZKIntegration.java:160) > at > org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.java:460) > at > org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550) > at > org.apache.slider.client.SliderClient.exec(SliderClient.java:383) > at > org.apache.slider.client.SliderClient.runService(SliderClient.java:348) > at > org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403) > at > org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630) > at org.apache.slider.Slider.main(Slider.java:49) > > Any ideas on why that may occur? My cluster is running on a set of VMs on > my development box. These failed ZK interactions will subsequently yield > issues in trying to recreate the given application (in this case HBase) > > — Jon
