[jira] [Updated] (SOLR-4046) An instance of CloudSolrServer is not able to handle consecutive request on different collections o.a.

Per Steffensen (JIRA) Wed, 07 Nov 2012 07:41:13 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Per Steffensen updated SOLR-4046:
---------------------------------

    Attachment: SOLR-4046.patch

I have made the following patch in our local version of Solr.

The patch could be done in various ways, but I decided to get rid of 
unneccesary code-complexity at the expense of negligible performance 
optimizations. So the idea about calculating and "caching" the different lists 
and only recalculate them on clusterState-change is gone. The lists are 
calculated from in-memory clusterState and it cannot take many ms to calculate 
the lists for every request - and the additional GC that comes out of it should 
also be negligible. The good think is that code becomes easier to read and 
understand.

Well, of course you can choose a different approach.
                
> An instance of CloudSolrServer is not able to handle consecutive request on 
> different collections o.a.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4046
>                 URL: https://issues.apache.org/jira/browse/SOLR-4046
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java, SolrCloud
>    Affects Versions: 4.0
>         Environment: Solr 4.0.0. Actually revision 1394844 on branch 
> lucene_solr_4_0 but I believe that is the same
>            Reporter: Per Steffensen
>            Priority: Critical
>         Attachments: SOLR-4046.patch
>
>
> CloudSolrServer saves urlList, leaderUrlList and replicasList on instance 
> level, and only recalculates those lists in case of clusterState changes. The 
> values calculated for the lists will be different for different 
> target-collections. Therefore they also ought to recalculated for a request 
> R, if the target-collection for R is different from the target-collection for 
> the request handled just before R by the same CloudSolrServer instance.
> Another problem with the implementation in CloudSolrServer is with the 
> lastClusterStateHashCode. lastClusterStateHashCode is updated when the first 
> request after a clusterState-change is handled. Before the 
> lastClusterStateHashCode is updated one of the following two sets of lists 
> are updated:
> * In case sendToLeader==true for the request: leaderUrlList and replicasList  
> are updated, but not urlList
> * In case sendToLeader==false for the request: urlList is updated, but not 
> leaderUrlList and replicasList
> But the lastClusterStateHashCode is always updated. So even though there was 
> just one collection in the world there is a problem: If the first request 
> after a clusterState-change is a sendToLeader==true-request urlList will 
> (potentially) be wrong (and will not be recalculated) for the next 
> sendToLeader==false-request to the same CloudSolrServer instance. If the 
> first request after a clusterState-change is a sendToLeader==false-request 
> leaderUrlList and replicasList will (potentially) be wrong (and will not be 
> recalculated) for the next sendToLeader==true-request to the same 
> CloudSolrServer instance.
> Besides that it is a very bad idea to have instance- and 
> local-method-variables with the same name. CloudSolrServer has an instance 
> variable called urlList and method CloudSolrServer.request has a 
> local-method-variable called urlList and the method also operates on instance 
> variable urlList. This makes the code hard to read.
> Havnt made a test in Apache Solr regi to reproduce the main error (the one 
> mentioned at the top above) but I guess you can easily do it yourself:
> Make a setup with two collections "collection1" and "collection2" - no 
> default collection. Add some documents to "collection2" (without any 
> autocommit). Then do cloudSolrServer.commit("collection1") and afterwards 
> cloudSolrServer.commit("collection2") (use same instance of CloudSolrServer). 
> Then try to search collection2 for the documents you inserted into it. They 
> ought to be found, but are not, because the 
> cloudSolrServer.commit("collection2") will not do a commit of collection2 - 
> it will actually do a commit of collection1.
> Well, actually you cant do cloudSolrServer.commit(<collection-name>) (the 
> method doesnt exist), but that ought to be corrected too. But you can do the 
> following instead:
> {code}
> UpdateRequest req = new UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, true, true);
> req.setParam(CoreAdminParams.COLLECTION, <collection-name>);
> req.process(cloudSolrServer);
> {code}
> In general I think you should add misc tests to your test-suite - tests that 
> run Solr-clusters with more than one collection and makes clever tests on 
> that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-4046) An instance of CloudSolrServer is not able to handle consecutive request on different collections o.a.

Reply via email to