BasicZkTest has the following bit of code, that I'm tripping on.

    zkServer.shutdown();

    *// document indexing shouldn't stop immediately after a ZK disconnect*
*    assertU(adoc("id", "201"));*

    Thread.sleep(300);

    // try a reconnect from disconnect
    zkServer = new ZkTestServer(zkDir, zkPort);
    zkServer.run(false);

It's not entirely clear to me that this should always be true.
ZkStateReader has means to cache and watch various bits of information, but
if it hasn't done the caching yet it may need to talk to zk before
completing the request. I am trying to use Collection Properties as an
alternative location for looking up the routed alias for a collection.
Current code uses a core property, but this is inconvenient for testing as
it can't be altered in the test... or at least I didn't find a way to alter
it. Also, future features such as archiving older collections from a TRA,
might find it useful to be able to disconnect the older collections from
the alias, but right now that would require finding all cores and editing
properties for all of them...

However BasicZkTest fails on this assert, because the fetching of
properties fails, throwing an exception.

So is this assert really reasonable? It kind of feels unreasonable but I'd
like some background from other folks here...
https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed
this some but The more I think about it, the more I'm convinced that
proceeding without zookeeper available seems dangerous. Any update sent to
an alias (TRA/CRA or regular) will need to check zookeeper for example....
Also security.json is in zookeeper, so anyone running with security on
probably tries to hit zookeeper on a cache miss too

I guess it comes down to the question of whether or not solr cloud should
work while zookeeper is down/unavail or not. This is the first I've run
into the notion that the answer might be yes. I'd always presumed that if
Zk went away all bets were off, because ZK is what makes a cloud out of us.

What I don't know is what existing use cases/installs might find this
assert critical (most of the above bug talked about LIR, and the comment on
the commit mentions leader election)

Thoughts?

-Gus

Reply via email to