[
https://issues.apache.org/jira/browse/SOLR-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248775#comment-16248775
]
Steve Rowe commented on SOLR-11484:
-----------------------------------
Reproducing failure of a test added on this issue, from
[https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/302/] (reproduces for
me on master branch on Linux):
{noformat}
Checking out Revision 6b5fbd3265e6819469e1b70a68908767fc14dd87
(refs/remotes/origin/branch_7x)
[...]
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=CloudSolrClientTest
-Dtests.method=testRetryUpdatesWhenClusterStateIsStale
-Dtests.seed=3D8409F14A788F1 -Dtests.slow=true -Dtests.locale=fr-BL
-Dtests.timezone=MET -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] ERROR 5.59s J1 |
CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<<
[junit4] > Throwable #1:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at
http://127.0.0.1:65235/solr/stale_state_test_col_shard1_replica_n1: Expected
mime type application/octet-stream but got text/html. <html>
[junit4] > <head>
[junit4] > <meta http-equiv="Content-Type"
content="text/html;charset=ISO-8859-1"/>
[junit4] > <title>Error 404 </title>
[junit4] > </head>
[junit4] > <body>
[junit4] > <h2>HTTP ERROR: 404</h2>
[junit4] > <p>Problem accessing
/solr/stale_state_test_col_shard1_replica_n1/update. Reason:
[junit4] > <pre> Can not find:
/solr/stale_state_test_col_shard1_replica_n1/update</pre></p>
[junit4] > <hr /><a href="http://eclipse.org/jetty">Powered by Jetty://
9.3.20.v20170531</a><hr/>
[junit4] > </body>
[junit4] > </html>
[junit4] > at
__randomizedtesting.SeedInfo.seed([3D8409F14A788F1:B7E9D877F74EFEDD]:0)
[junit4] > at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
[junit4] > at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
[junit4] > at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
[junit4] > at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
[junit4] > at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:559)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1016)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:883)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:816)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
[junit4] > at
org.apache.solr.client.solrj.request.UpdateRequest.commit(UpdateRequest.java:233)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale(CloudSolrClientTest.java:844)
[junit4] > at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit4] > at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit4] > at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit4] > at
java.base/java.lang.reflect.Method.invoke(Method.java:564)
[junit4] > at java.base/java.lang.Thread.run(Thread.java:844)
[...]
[junit4] 2> NOTE: test params are: codec=CheapBastard,
sim=RandomSimilarity(queryNorm=false): {}, locale=fr-BL, timezone=MET
{noformat}
> CloudSolrClient's cache of collection clusterstate can cause RouteExceptions
> when attempting directUpdates after collection modifications
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-11484
> URL: https://issues.apache.org/jira/browse/SOLR-11484
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Assignee: Noble Paul
> Fix For: 7.2, master (8.0)
>
> Attachments: SOLR-11484.patch, SOLR-11484.patch,
> jenkins.thetaphi.20662.txt
>
>
> This was discovered while auditing jenkins failures from
> {{TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete}}
> (where a test explicitly deletes and then recreates a collection with the
> same name), but as noted in a comment below, SOLR-11392 is another example of
> non-obvious test failures that can pop up because of this bug.
> In practice, it can affect any CloudSolrClient user after changes have been
> made to a collection (to add/move replicas, etc...)
> ----
> Original jira notes...
> {{TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete}}
> seems to fail with non-trivial frequency, so I grabbed the logs from a recent
> failure and starting trying to follow along with the actions to figure out
> what exactly is happening....
> https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20662/
> {noformat}
> [junit4] ERROR 20.3s J1 |
> TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete <<<
> [junit4] > Throwable #1:
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from
> server at https://127.0.0.1:42959/solr/testcollection_shard1_replica_n3:
> Expected mime type a
> pplication/octet-stream but got text/html. <html>
> [junit4] > <head>
> [junit4] > <meta http-equiv="Content-Type"
> content="text/html;charset=ISO-8859-1"/>
> [junit4] > <title>Error 404 </title>
> {noformat}
> The crux of this failure appears to be a genuine bug in how CloudSolrClient
> uses it's cached ClusterState info when doing (direct) updates. The key bits
> seem to be:
> * CloudSolrClient does _something_ (update,query,etc...) with a collection
> causing the current cluster state for the collection to be cached
> * The actual collection changes such that a Solr node/core no longer exists
> as part of the collection
> * CloudSolrClient is asked to process an UpdateRequest which triggers the
> code paths for the {{directUpdate()}} method -- which attempts to route the
> updates directly to a replica of the appropriate shard using the (cache)
> collection state info
> * CloudSolrClient (may) attempt to send that UpdateRequest to a node/core
> that doesn't exist, getting a 404 -- which does not (seem to) trigger a state
> refresh, or retry to find a correct URL to resend the update to.
> Details to follow in comment....
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]