I believe all tests still run with a 1 zk cluster, if still the case, zk consistency shouldn’t matter.
It’s been a long while since I’ve looked into that particular doc/issue, but even with more than 1 zk instance I believe that is only in an issue in a fairly specific case - when a client does something with zk and so it assumes it’s done and then triggers something else with the assumption the change is made. That something else may not see the change, though normally this would require it’s using a different zk client instance. Unfortunately, we don’t always currently use a single zk client per node, but even still, this is not a normal pattern. Most Solr ZK usage should not have an issue with this case as most behavior is driven directly by notifications from zookeeper or does not trigger something else with this assumption. Mark On Sun, Sep 26, 2021 at 8:24 AM David Smiley <[email protected]> wrote: > This drives me crazy too. > > +1 to Ilan's point. For a CloudSolrClient, it's state knowledge should > merely be a hint and not the final word -- need to go to ZK for that. For > the HTTP based ClusterStateProvider, the receiving Solr side needs to use > non-cached information -- must go to ZK always (maybe toggle-able with a > param if need be). > > Still, here's a public service announcement on a guarantee that ZooKeeper > does *not* have: > https://zookeeper.apache.org/doc/r3.5.9/zookeeperProgrammers.html#ch_zkGuarantees > see lack of "Simultaneously Consistent Cross-Client Views" in the note. > After reading this (and being shocked by its implications), I added > https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2071 > And I also tried to highlight this... seems maybe not the dev list (I can't > find it now) but at least in JIRA somewhere. > So maybe all ClusterStateProviders need to ask that a Zk "sync" is called > to guarantee the view is up-to-date? I'm not sure what the cost is but it > may be a cost we can't safely avoid. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Wed, Sep 22, 2021 at 6:26 PM Ilan Ginzburg <[email protected]> wrote: > >> Not sure Gus I would blame the create collection code. To the best of my >> recollection, when the create collection call returns the collection IS >> fully created. >> This doesn't mean though (and that's the problem IMO) that the cluster >> state on the node that issued the collection creation call is aware of it: >> its cache of cluster state is updated async at a later point once Zookeeper >> watches decide it's time). >> >> I would tend to blame the way cluster state is managed in general in the >> cluster. >> >> I didn't look at this test specifically, so the actual issue might still >> be different. >> >> Ilan >> >> On Wed, Sep 22, 2021 at 5:37 PM Gus Heck <[email protected]> wrote: >> >>> why it often can’t find the collection it’s currently supposed to be >>>> creating >>> >>> >>> This sounds like things that pestered us while writing TRA tests. IIRC >>> the problem basically comes from 2 things: 1) we return from create >>> collection before the collection is fully created and ready to use, 2) >>> watching code to determine when it IS ready is non-trivial. I think #1 is >>> the real problem and #2 is a bandaid that shouldn't be needed. >>> >>> I think I recall mark previously ranting about how insane and terrible >>> it would be if an RDBMS did this with CREATE TABLE... >>> >>> On Wed, Sep 22, 2021 at 11:24 AM Ishan Chattopadhyaya < >>> [email protected]> wrote: >>> >>>> Sure, Mark. >>>> Noble or I will get to this at the earliest, hopefully by end of this >>>> week. >>>> Unfortunately, I haven't been paying attention to test failures lately. >>>> >>>> On Wed, Sep 22, 2021 at 8:09 PM Mark Miller <[email protected]> >>>> wrote: >>>> >>>>> Perhaps I just have a unique test running experience, but this test >>>>> has been an outlier failure test in my test runs for months. Given that >>>>> it’s newer than most tests, I imagine it’s attention grabbing days are on >>>>> a >>>>> downslope, so here is a poke if someone wants to check out why it often >>>>> can’t find the collection it’s currently supposed to be creating. >>>>> >>>>> >>>>> -- >>>>> - Mark >>>>> >>>>> http://about.me/markrmiller >>>>> >>>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >> -- - Mark http://about.me/markrmiller
