[ 
https://issues.apache.org/jira/browse/SOLR-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518798#comment-16518798
 ] 

Gus Heck commented on SOLR-12413:
---------------------------------

I did test this manually by
 # creating a 4 node cluster,
 # copying the aliases.json to a file,
 # modifying it to add an alias,
 # bringing the cluster down,
 # deleting aliases.json from zk,
 # uploading the edited version to zk
 # restarting the cluster... 

At which time I observed the change in the UI and successfully queried the 
alias.... 

That test you supplied doesn't seem to work for me with or without the patch... 
the deletion of aliases.json appears to blow up the cluster almost 
immediately... the delete triggers the watch and leads to:

 
{code:java}
22756 ERROR (zkCallback-21-thread-1) [ ] o.a.s.c.c.ZkStateReader$AliasesManager 
A ZK error has occurred
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /aliases.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:114) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
at 
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:341)
 ~[java/:?]
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[java/:?]
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:341) 
~[java/:?]
at 
org.apache.solr.common.cloud.ZkStateReader$AliasesManager.process(ZkStateReader.java:1781)
 ~[java/:?]
at 
org.apache.solr.common.cloud.SolrZkClient$1.lambda$process$1(SolrZkClient.java:270)
 ~[java/:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_144]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
 ~[java/:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_144]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
22756 ERROR (zkCallback-28-thread-1) [ ] o.a.s.c.c.ZkStateReader$AliasesManager 
A ZK error has occurred
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /aliases.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:114) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
at 
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:341)
 ~[java/:?]
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[java/:?]
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:341) 
~[java/:?]
at 
org.apache.solr.common.cloud.ZkStateReader$AliasesManager.process(ZkStateReader.java:1781)
 ~[java/:?]
at 
org.apache.solr.common.cloud.SolrZkClient$1.lambda$process$1(SolrZkClient.java:270)
 ~[java/:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_144]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
 ~[java/:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_144]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
{code}
followed by
{code:java}
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request:[http://127.0.0.1:33023/solr/alias1]

at __randomizedtesting.SeedInfo.seed([3A63ED446F3BE85D:C178AC5AAC5AF704]:0)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:462)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at 
org.apache.solr.cloud.AliasIntegrationTest.testPreExistingAliases(AliasIntegrationTest.java:105)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
{code}
All tests running after it fail, so the test cluster is just hosed at that 
point (not dying a horribly when we can't find the aliases.json might be a nice 
enhancement/bugfix, but one could also argue that if one goes deleting critical 
files during running, that's not ever a supported use case ).

AFAIK there's also no way in zookeeper to manually set the version back to zero 
(this is understandable).

I don't think this particular fix is unit testable unless we find a way to 
bring up a truly independent cloud server with an independent zookeeper 
instance. Then we might have the ability to stop solr without stopping 
zookeeper. 

> Solr ignores aliases.json from ZooKeeper at startup
> ---------------------------------------------------
>
>                 Key: SOLR-12413
>                 URL: https://issues.apache.org/jira/browse/SOLR-12413
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.2.1
>         Environment: A SolrCloud cluster with ZooKeeper (one node is enough 
> to reproduce).
> Solr 7.2.1.
> ZooKeeper 3.4.6.
>            Reporter: Gaël Jourdan
>            Assignee: David Smiley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since upgrading to 7.2.1, we ran into an issue where Solr ignores 
> _aliases.json_ file stored in ZooKeeper.
>  
> +Steps to reproduce the problem:+
>  # SolrCloud cluster is down
>  # Direct update of _aliases.json_ file in ZooKeeper with Solr ZkCLI *without 
> using Collections API* :
>  ** {{java ... org.apache.solr.cloud.ZkCLI -zkhost ... -cmd clear 
> /aliases.json}}
>  ** {{java ... org.apache.solr.cloud.ZkCLI -zkhost ... -cmd put /aliases.json 
> "new content"}}
>  # SolrCloud cluster is started => _aliases.json_ not taken into account
>  
> +Analysis:+ 
> Digging a bit in the code, what is actually causing the issue is that, when 
> starting, Solr now checks for the metadata of the _aliases.json_ file and if 
> the version metadata from ZooKeeper is lower or equal to local version, it 
> keeps the local version.
> When it starts, Solr has a local version of 0 for the aliases but ZooKeeper 
> also has a version of 0 of the file because we just recreated it. So Solr 
> ignores ZooKeeper configuration and never has a chance to load aliases.
>  
> Relevant parts of Solr code are:
>  * 
> [https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java]
>  : line 1562 : method setIfNewer
> {code:java}
> /**
> * Update the internal aliases reference with a new one, provided that its ZK 
> version has increased.
> *
> * @param newAliases the potentially newer version of Aliases
> */
> private boolean setIfNewer(Aliases newAliases) {
>   synchronized (this) {
>     int cmp = Integer.compare(aliases.getZNodeVersion(), 
> newAliases.getZNodeVersion());
>     if (cmp < 0) {
>       LOG.debug("Aliases: cmp={}, new definition is: {}", cmp, newAliases);
>       aliases = newAliases;
>       this.notifyAll();
>       return true;
>     } else {
>       LOG.debug("Aliases: cmp={}, not overwriting ZK version.", cmp);
>       assert cmp != 0 || Arrays.equals(aliases.toJSON(), newAliases.toJSON()) 
> : aliases + " != " + newAliases;
>     return false;
>     }
>   }
> }{code}
>  * 
> [https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/java/org/apache/solr/common/cloud/Aliases.java]
>  : line 45 : the "empty" Aliases object with default version 0
> {code:java}
> /**
> * An empty, minimal Aliases primarily used to support the non-cloud solr use 
> cases. Not normally useful
> * in cloud situations where the version of the node needs to be tracked even 
> if all aliases are removed.
> * A version of 0 is provided rather than -1 to minimize the possibility that 
> if this is used in a cloud
> * instance data is written without version checking.
> */
> public static final Aliases EMPTY = new Aliases(Collections.emptyMap(), 
> Collections.emptyMap(), 0);{code}
>  
> Note that a workaround is to force ZooKeeper to always have a version greater 
> than 0 for _aliases.json_ file (for instance by not clearing the file and 
> just overwriting it again and again).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to