[ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112472#comment-17112472
 ] 

Colvin Cowie edited comment on SOLR-14503 at 5/20/20, 5:57 PM:
---------------------------------------------------------------

I see {{ZkFailoverTest}} was added for SOLR-5129, but because it does 
{{}}{{Thread.sleep({color:#0000ff}5000{color});}} with {{waitForZk}} set to 60 
it doesn't stop the zk server for long enough for it to exceed either the 
configured timeout or the unconfigured DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 
seconds.

-I've tried modifying the test to cover both a successful start and the 
configured timeout being exceeded, but I can't quite get it to work with both 
cases at the same time since I seem to end up with the server dead when the 
second test starts, and I'm not familiar enough with way these tests are 
written to know what the right way to write these tests is.-

-If I simply duplicate the existing test method so that there's two test cases 
doing the same thing, it also fails. So it's not specific to the case that I'm 
adding.-

 

Edit: I see, it's because {{ZkFailoverTest}} is a SolrCloudTestCase and the 
zookeeper is left shutdown at the end of the test, but no new instance is 
created at the start of the next test


was (Author: cjcowie):
I see {{ZkFailoverTest}} was added for SOLR-5129, but because it does 
{{}}{{Thread.sleep({color:#0000ff}5000{color});}} with {{waitForZk}} set to 60 
it doesn't stop the zk server for long enough for it to exceed either the 
configured timeout or the unconfigured DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 
seconds.

I've tried modifying the test to cover both a successful start and the 
configured timeout being exceeded, but I can't quite get it to work with both 
cases at the same time since I seem to end up with the server dead when the 
second test starts, and I'm not familiar enough with way these tests are 
written to know what the right way to write these tests is.

If I simply duplicate the existing test method so that there's two test cases 
doing the same thing, it also fails. So it's not specific to the case that I'm 
adding. [^flawed-test.patch]

 

Edit: I see, it's because {{ZkFailoverTest}} is a SolrCloudTestCase and the 
zookeeper is left shutdown at the end of the test, but no new instance is 
created at the start of the next test

> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> -----------------------------------------------------------
>
>                 Key: SOLR-14503
>                 URL: https://issues.apache.org/jira/browse/SOLR-14503
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>            Reporter: Colvin Cowie
>            Priority: Minor
>         Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to