OK, I think I may have found the culprit. There are lots of places where we call InetSocketAddress.getHostName(). The documentation on this is worthless, but looking at the openjdk source code, getHostName absolutely triggers a reverse DNS lookup.
I'm going to try modifying these to just use the toString function (which doesn't do the lookup) and see if I get past this problem. I'll update with progress. On Wed, May 23, 2012 at 9:02 AM, Marshall McMullen < [email protected]> wrote: > Sorry, had jury duty yesterday so wasn't able to respond until now.... > > No, leader election fails as well. What's interesting is it looks like the > bind of the leader socket/port itself is never completing. I enable > tracing, and without a valid DNS server running, all zookeeper servers just > hang at startup. In the trace file, the last thing printed in every log > file before the hang is: > > 2012-05-23 08:49:29,882 [myid:1] - INFO [main:QuorumPeerMain@131][] - > Starting quorum peer > 2012-05-23 08:49:29,893 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] > - binding to port /127.0.0.2:2181 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1107][] - > tickTime set to 2000 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1127][] - > minSessionTimeout set to -1 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1138][] - > maxSessionTimeout set to -1 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1153][] - > initLimit set to 10 > > > Now, if I repeat the test with a functioning DNS server running, it churns > along as expected: > > 2012-05-23 08:58:03,468 [myid:1] - INFO [main:QuorumPeerMain@131][] - > Starting quorum peer > 2012-05-23 08:58:03,479 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] > - binding to port /127.0.0.2:2181 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1107][] - > tickTime set to 2000 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1127][] - > minSessionTimeout set to -1 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1138][] - > maxSessionTimeout set to -1 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1153][] - > initLimit set to 10 > 2012-05-23 08:58:03,665 [myid:1] - INFO [main:QuorumPeer@620][] - > currentEpoch not found! Creating with a reasonable default of 0. This > should only happen when you are upgrading your installation > 2012-05-23 08:58:03,666 [myid:1] - INFO [main:QuorumPeer@635][] - > acceptedEpoch not found! Creating with a reasonable default of 0. This > should only happen when you are upgrading your installation > 2012-05-23 08:58:03,670 [myid:1] - INFO > [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxnFactory@227][] - > Accepted socket connection from /127.0.0.1:54763 > 2012-05-23 08:58:03,672 [myid:1] - INFO > [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind > port: /127.0.0.2:2183 > 2012-05-23 08:58:03,675 [myid:1] - WARN > [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@354][] - Exception > causing close of session 0x0 due to java.io.IOException: ZooKeeperServer > not running > 2012-05-23 08:58:03,675 [myid:1] - DEBUG [QuorumPeer[myid=1]/127.0.0.2:2181 > :QuorumPeer@825][] - Starting quorum peer > 2012-05-23 08:58:03,675 [myid:1] - DEBUG > [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@358][] - IOException > stack trace > > So the leader socket/port bind appears to hang indefinitely without DNS. > As I mentioned before, we are using IP addresses only and in our customer > environment we will not always have DNS so we'd really like to remove this > requirement...Anyone have ideas where I can start looking to figure this > out? > > > On Tue, May 22, 2012 at 1:19 AM, Flavio Junqueira <[email protected]>wrote: > >> Are they able to elect a leader or not even that? >> >> -Flavio >> >> On May 22, 2012, at 6:31 AM, Marshall McMullen wrote: >> >> > In our Linux environment, we're using IP addresses only for all our >> > zookeeper servers. We've observed that without a functioning DNS server, >> > zookeeper peers cannot communicate with one another. We have been able >> to >> > work around this in the past by putting entries in /etc/hosts for all >> the >> > zookeeper servers. With entries in /etc/hosts no reverse name lookup is >> > performed and everything works fine. >> > >> > Has anyone else seen this behavior or can confirm/deny whether zookeeper >> > requires (assumes) a functioning DNS server.. ? >> > >> > I've gone through a lot of the quorum code related to IP addresses, and >> I >> > thought the culprit might be calls to InetAddress.getByName. But >> looking at >> > the source code for that (at least in openjdk) they return if the given >> > string is an actual IP address. Other thoughts I had were calls to >> > InetSocketAddress(hostname, port), but that looks like it similarly goes >> > through InetAddress so that should be OK. >> > >> > Anyhow, I'll keep digging into this, but any ideas or help would be >> > appreciated! >> >> >> >
