[
https://issues.apache.org/jira/browse/SOLR-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shalin Shekhar Mangar updated SOLR-6591:
----------------------------------------
Attachment: SOLR-6591-ignore-no-collection-path.patch
{quote}
A rapid create+delete loop for collections with state format > 1 causes the
above exception to happen. This is because the updateZkState method assumes
that the collection exists and it tries to write to
/collections/collection_name/state.json directly without verifying whether the
/collections/collection_name zk node exists
{quote}
This patch ignores state messages which are trying to create new collections
when the parent zk path doesn't exist. I've added the following comment in the
code to explain the situation:
{quote}
// if the /collections/collection_name path doesn't exist then
it means that
// 1) the user invoked a DELETE collection API and the
OverseerCollectionProcessor has deleted
// this zk path.
// 2) these are most likely old "state" messages which are
only being processed now because
// if they were new "state" messages then in legacy mode, a
new collection would have been
// created with stateFormat = 1 (which is the default state
format)
// 3) these can't be new "state" messages created for a new
collection because
// otherwise the OverseerCollectionProcessor would have
already created this path
// as part of the create collection API call -- which is the
only way in which a collection
// with stateFormat > 1 can possibly be created
{quote}
> Cluster state updates can be lost on exception in main queue loop
> -----------------------------------------------------------------
>
> Key: SOLR-6591
> URL: https://issues.apache.org/jira/browse/SOLR-6591
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: Trunk
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Fix For: Trunk
>
> Attachments: SOLR-6591-constructStateFix.patch,
> SOLR-6591-ignore-no-collection-path.patch, SOLR-6591-no-mixed-batches.patch,
> SOLR-6591.patch
>
>
> I found this bug while going through the failure on jenkins:
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/648/
> {code}
> 2 tests failed.
> REGRESSION:
> org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch
> Error Message:
> Error CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create
> core [halfcollection_shard1_replica1] Caused by: Could not get shard id for
> core: halfcollection_shard1_replica1
> Stack Trace:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
> CREATEing SolrCore 'halfcollection_shard1_replica1': Unable to create core
> [halfcollection_shard1_replica1] Caused by: Could not get shard id for core:
> halfcollection_shard1_replica1
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> at
> org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
> at
> org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
> at
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]