Well, we're reconstructing a chain of _possibilities_ post-mortem, so there's not much I can say for sure. Mostly just throwing this out there in case it sparks some "aha" moments. Not knowing ZK well, anything I say is speculation.
But I speculate that this isn't really the root of the problem given that we haven't been seeing the "ClusterState says we are the leader..." error go by the user lists for a while. It may well be a coincidence. The place that this happened reported that the problem "seemed to be better" after adjusting the ZK nodes' times. I know when I reconstruct events like this I'm never sure about cause and effect since I'm usually doing several things at once. Erick On Tue, Aug 6, 2013 at 5:51 PM, Chris Hostetter <[email protected]>wrote: > > : > When the times were coordinated, many of the problems with recovery > went > : > away. We're trying to reconstruct the scenario from memory, but it > : > prompted me to pass the incident in case it sparked any thoughts. > : > Specifically, I wonder if there's anything that comes to mind if the ZK > : > nodes are significantly out of synch with each other time-wise. > : > : Does this mean that ntp or other strict time synchronization is > important for > : SolrCloud? I strive for this anyway, just to ensure that when I'm > researching > : log files between two machines that I can match things up properly. > > I don't know if/how Solr/ZK is affected by having machines with clocks out > of sync, but i do remember seeing discussions a while back about weird > things happening ot ZK client apps *while* time adjustments are taking > place to get back in sync. > > IIRC: as the local clock starts accelerating and jumping ahead in > increments to "correct" itself with ntp, then those jumps can confuse the > ZK code into thinking it's been waiting a lot longer then it really > has for zk heartbeat (or whatever it's called) and it can trigger a > timeout situation. > > > -Hoss > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
