Not this, right? http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6900441 http://osdir.com/ml/hotspot-runtime-dev-java/2013-09/msg00006.html https://bbossola.wordpress.com/2013/09/04/jvm-issue-concurrency-is-affected-by-changing-the-date-of-the-system/
Patrick On Mon, Mar 23, 2015 at 5:00 PM, Jared Cantwell <jared.cantw...@gmail.com> wrote: > Greetings, > > We just saw this problem again, and this time we were able to capture a > core file of the jvm using gdb. I've run it through jstack and jmap to get > a heap profile. I can see that the FollowerZookeeperServer has > a requestsInProcess member that is ~24K. I can also see that the > CommitProcessor's queuedRequest's list has the 24K items in it, so the > FinalRequestProcessor's processRequest function isn't ever getting called > to complete the requests. > > The CommitProcessor's run() is doing this: > > Thread 23510: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - org.apache.zookeeper.server.quorum.CommitProcessor.run() @bci=165, > line=182 (Compiled frame) > > Based on the state, it made it to wait() because isWaitingForCommit()==true > && committedRequests.isEmpty()==true. > > Strangely, once we detached from the jvm, it must have woken up this thread > and the queue flushed out as expected, bringing everything back to normal. > > I'll keep digging, but any help or direction would be appreciated as I'm > not very familiar with this area of the codebase. > > Thanks! > Jared > > > On Tue, Feb 17, 2015 at 2:38 PM, Flavio Junqueira < > fpjunque...@yahoo.com.invalid> wrote: > >> It doesn't ring a bell, but it might be worth having a look at the logs to >> see if there is anything unusual. >> >> Just to clarify, was the number of outstanding requests growing, constant? >> I suppose the server was following/leading and operations were going >> through, otherwise it'd have dropped the connection to the leader or >> leadership. >> >> -Flavio >> >> > On 17 Feb 2015, at 18:01, Marshall McMullen <marshall.mcmul...@gmail.com> >> wrote: >> > >> > Greetings, >> > >> > We saw an issue recently that I've never seen before and am hoping I can >> > get some clarity on what may cause this and whether it's a known issue. >> We >> > had a 5 node ensemble and were unable to connect to one of the ZooKeeper >> > instances. When trying to connect with zkCli it would timeout. When I >> > connected via telnet and issued the srvr four letter word, I was >> surprised >> > to see that this one server reported a massive number of 'Outstanding' >> > requests. I'd never seen that really be anything other than 0 before. On >> > the ZK dev guide it says: >> > >> > "outstanding is the number of queued requests, this increases when the >> > server is under load and is receiving more sustained requests than it can >> > process, ie the request queue". I looked at all the ZK servers in my >> > ensemble: >> > >> > for ip in 101 102 103 104 105; do echo srvr | nc 172.21.20.${ip} 2181 | >> > grep Outstanding; done >> > Outstanding: 0 >> > Outstanding: 0 >> > Outstanding: 0 >> > Outstanding: 0 >> > Outstanding: 18876 >> > >> > I eventually killed ZK on the affected server and everything corrected >> > itself and Outstanding went to zero and I was able to connect again. >> > >> > Is this something anyone's familiar with? I have logs if it would be >> > helpful. >> > >> > Thanks! >> >>