Issue resolved itself without change to the test when I realized it might be important to cache collection properties, though that's now under discussion... (see https://issues.apache.org/jira/browse/SOLR-13418 / https://issues.apache.org/jira/browse/SOLR-13420 if you're interested)
Might still be good to change the test though since it seems to promise more than can be delivered? On Fri, Apr 26, 2019 at 10:46 AM David Smiley <[email protected]> wrote: > I agree with Erick's response, and thus the test/assertion seems > unreasonable. > > If ZK is down, all bets are off on indexing proceeding. In practice, > people expect searches to continue for some time at least. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Mon, Apr 22, 2019 at 1:54 PM Erick Erickson <[email protected]> > wrote: > >> On the surface, I’m automatically suspicious of _anything_ that relies on >> an arbitrary wait period for a state to settle down. Would this 300ms sleep >> be adequate on a very fast machine running just one test? >> >> I don’t see the value that assert anyway. I can’t come up with a use-case >> for a running Solr functioning incorrectly because it failed to update a >> document while ZooKeeper was shutting down. >> >> FWIW >> Erick >> >> > On Apr 22, 2019, at 8:42 AM, Gus Heck <[email protected]> wrote: >> > >> > BasicZkTest has the following bit of code, that I'm tripping on. >> > >> > zkServer.shutdown(); >> > >> > // document indexing shouldn't stop immediately after a ZK >> disconnect >> > assertU(adoc("id", "201")); >> > >> > Thread.sleep(300); >> > >> > // try a reconnect from disconnect >> > zkServer = new ZkTestServer(zkDir, zkPort); >> > zkServer.run(false); >> > >> > It's not entirely clear to me that this should always be true. >> ZkStateReader has means to cache and watch various bits of information, but >> if it hasn't done the caching yet it may need to talk to zk before >> completing the request. I am trying to use Collection Properties as an >> alternative location for looking up the routed alias for a collection. >> Current code uses a core property, but this is inconvenient for testing as >> it can't be altered in the test... or at least I didn't find a way to alter >> it. Also, future features such as archiving older collections from a TRA, >> might find it useful to be able to disconnect the older collections from >> the alias, but right now that would require finding all cores and editing >> properties for all of them... >> > >> > However BasicZkTest fails on this assert, because the fetching of >> properties fails, throwing an exception. >> > >> > So is this assert really reasonable? It kind of feels unreasonable but >> I'd like some background from other folks here... >> https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed >> this some but The more I think about it, the more I'm convinced that >> proceeding without zookeeper available seems dangerous. Any update sent to >> an alias (TRA/CRA or regular) will need to check zookeeper for example.... >> Also security.json is in zookeeper, so anyone running with security on >> probably tries to hit zookeeper on a cache miss too >> > >> > I guess it comes down to the question of whether or not solr cloud >> should work while zookeeper is down/unavail or not. This is the first I've >> run into the notion that the answer might be yes. I'd always presumed that >> if Zk went away all bets were off, because ZK is what makes a cloud out of >> us. >> > >> > What I don't know is what existing use cases/installs might find this >> assert critical (most of the above bug talked about LIR, and the comment on >> the commit mentions leader election) >> > >> > Thoughts? >> > >> > -Gus >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
