To run an Accumulo instance on one of these VMs successfully, I've been
having to up Accumulo's zk timeout from the normal 30s to 180s (this is
with HDFS, ZK, and Accumulo all running on the same VM).  So you could just
be having an issue with resources.  We could consider increasing or making
configurable the zk timeout or number of retries that Slider uses for
various zk operations.

On Wed, Feb 25, 2015 at 6:53 AM, Jon Maron <[email protected]> wrote:

> I’ve noticed that I’m having intermittent issues accessing the zookeeper
> quorum during “destroy” attempts:
>
> 2015-02-25 09:48:02,345 [main] WARN  client.SliderClient
> (SliderClient.java:getZkClient(523)) - Unable to connect to zookeeper
> quorum c6402.ambari.apache.org:2181,c6404.ambari.apache.org:2181,
> c6403.ambari.apache.org:2181,c6405.ambari.apache.org:2181
> java.net.ConnectException: Unable to connect to ZK quorum
>         at
> org.apache.slider.core.zk.BlockingZKWatcher.waitForZKConnection(BlockingZKWatcher.java:63)
>         at
> org.apache.slider.client.SliderClient.getZkClient(SliderClient.java:518)
>         at
> org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.java:458)
>         at
> org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550)
>         at
> org.apache.slider.client.SliderClient.exec(SliderClient.java:383)
>         at
> org.apache.slider.client.SliderClient.runService(SliderClient.java:348)
>         at
> org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
>         at
> org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
>         at
> org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
>         at
> org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)
>         at org.apache.slider.Slider.main(Slider.java:49)
> 2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient
> (SliderClient.java:deleteZookeeperNode(474)) - Unable to recursively delete
> zk node /services/slider/users/jmaron/hbase-test
> 2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient
> (SliderClient.java:deleteZookeeperNode(475)) - Reason:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /services/slider/users/jmaron/hbase-test
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
>         at
> org.apache.slider.core.zk.ZKIntegration.stat(ZKIntegration.java:164)
>         at
> org.apache.slider.core.zk.ZKIntegration.exists(ZKIntegration.java:160)
>         at
> org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.java:460)
>         at
> org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550)
>         at
> org.apache.slider.client.SliderClient.exec(SliderClient.java:383)
>         at
> org.apache.slider.client.SliderClient.runService(SliderClient.java:348)
>         at
> org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
>         at
> org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
>         at
> org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
>         at
> org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)
>         at org.apache.slider.Slider.main(Slider.java:49)
>
> Any ideas on why that may occur?  My cluster is running on a set of VMs on
> my development box.  These failed ZK interactions will subsequently yield
> issues in trying to recreate the given application (in this case HBase)
>
> — Jon

Reply via email to