I may have missed it as it went by, but what was the evidence that the zk quorum actually includes all the zookeeper nodes? This could be answered by examination if the logs, but more definitive and simpler might be to configure to use only one zk node instead of three.
The rationale here is that if difference drillbits talked to different zk modes they could well have not known about each other. Sent from my iPhone > On Oct 27, 2014, at 10:26, Chris Drawater <[email protected]> wrote: > > Ramana Inukonda <rinukonda@...> writes: > > > > >> Could you look at the zookeeper logs and see if there is any information > >> there? Zookeeper logs should be at zk install location/ logs. There should > >> be two files. A .log and .out. Please check both. > > >> Regards > >> Ramana > > > > > Thanks Ramana. > > > > We've now isolated our 3 * VMs onto their own private network... > > > > Now we see the following in the DrillBit.log : > > > > > > 2014-10-27 15:34:37,461 [d80e5b2c-3658-47ff-be30-fe884475feab:frag:0:0] > WARN o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send > complete. > > java.lang.InterruptedException: null > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterru > ptibly(AbstractQueuedSynchronizer.java:996) ~[na:1.7.0_65] > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterrupt > ibly(AbstractQueuedSynchronizer.java:1303) ~[na:1.7.0_65] > > at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~ > [na:1.7.0_65] > > at > org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete > (SendingAccountor.java:44) ~[drill-java-exec-0.6.0-incubating-rebuffe > > d.jar:0.6.0-incubating] > > at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop > (ScreenCreator.java:186) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6. > > 0-incubating] > > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources > (FragmentExecutor.java:134) [drill-java-exec-0.6.0-incubating-rebuffed. > > jar:0.6.0-incubating] > > at org.apache.drill.exec.work.fragment.FragmentExecutor.run > (FragmentExecutor.java:109) [drill-java-exec-0.6.0-incubating- > rebuffed.jar:0.6.0-incu > > bating] > > at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run > (WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0- > incubat > > ing] > > at java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1145) [na:1.7.0_65] > > at java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:615) [na:1.7.0_65] > > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] > > > > but no corresponding errors in the Zookeeper logs... > > > > Chris > > > > > > >
