Sorry, had jury duty yesterday so wasn't able to respond until now.... No, leader election fails as well. What's interesting is it looks like the bind of the leader socket/port itself is never completing. I enable tracing, and without a valid DNS server running, all zookeeper servers just hang at startup. In the trace file, the last thing printed in every log file before the hang is:
2012-05-23 08:49:29,882 [myid:1] - INFO [main:QuorumPeerMain@131][] - Starting quorum peer 2012-05-23 08:49:29,893 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] - binding to port /127.0.0.2:2181 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1107][] - tickTime set to 2000 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1127][] - minSessionTimeout set to -1 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1138][] - maxSessionTimeout set to -1 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1153][] - initLimit set to 10 Now, if I repeat the test with a functioning DNS server running, it churns along as expected: 2012-05-23 08:58:03,468 [myid:1] - INFO [main:QuorumPeerMain@131][] - Starting quorum peer 2012-05-23 08:58:03,479 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] - binding to port /127.0.0.2:2181 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1107][] - tickTime set to 2000 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1127][] - minSessionTimeout set to -1 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1138][] - maxSessionTimeout set to -1 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1153][] - initLimit set to 10 2012-05-23 08:58:03,665 [myid:1] - INFO [main:QuorumPeer@620][] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation 2012-05-23 08:58:03,666 [myid:1] - INFO [main:QuorumPeer@635][] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation 2012-05-23 08:58:03,670 [myid:1] - INFO [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxnFactory@227][] - Accepted socket connection from /127.0.0.1:54763 2012-05-23 08:58:03,672 [myid:1] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind port: /127.0.0.2:2183 2012-05-23 08:58:03,675 [myid:1] - WARN [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@354][] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2012-05-23 08:58:03,675 [myid:1] - DEBUG [QuorumPeer[myid=1]/127.0.0.2:2181 :QuorumPeer@825][] - Starting quorum peer 2012-05-23 08:58:03,675 [myid:1] - DEBUG [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@358][] - IOException stack trace So the leader socket/port bind appears to hang indefinitely without DNS. As I mentioned before, we are using IP addresses only and in our customer environment we will not always have DNS so we'd really like to remove this requirement...Anyone have ideas where I can start looking to figure this out? On Tue, May 22, 2012 at 1:19 AM, Flavio Junqueira <[email protected]> wrote: > Are they able to elect a leader or not even that? > > -Flavio > > On May 22, 2012, at 6:31 AM, Marshall McMullen wrote: > > > In our Linux environment, we're using IP addresses only for all our > > zookeeper servers. We've observed that without a functioning DNS server, > > zookeeper peers cannot communicate with one another. We have been able to > > work around this in the past by putting entries in /etc/hosts for all the > > zookeeper servers. With entries in /etc/hosts no reverse name lookup is > > performed and everything works fine. > > > > Has anyone else seen this behavior or can confirm/deny whether zookeeper > > requires (assumes) a functioning DNS server.. ? > > > > I've gone through a lot of the quorum code related to IP addresses, and I > > thought the culprit might be calls to InetAddress.getByName. But looking > at > > the source code for that (at least in openjdk) they return if the given > > string is an actual IP address. Other thoughts I had were calls to > > InetSocketAddress(hostname, port), but that looks like it similarly goes > > through InetAddress so that should be OK. > > > > Anyhow, I'll keep digging into this, but any ideas or help would be > > appreciated! > > >
