Tim Owen created SOLR-9915:
------------------------------

             Summary: PeerSync alreadyInSync check is not backwards compatible
                 Key: SOLR-9915
                 URL: https://issues.apache.org/jira/browse/SOLR-9915
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: replication (java)
    Affects Versions: 6.3
            Reporter: Tim Owen


The fingerprint check added to PeerSync in SOLR-9446 works fine when all 
servers are running 6.3 but this means it's hard to do a rolling upgrade from 
e.g. 6.2.1 to 6.3 because the 6.3 server sends a request to a 6.2.1 server to 
get a fingerprint and then gets a NPE because the older server doesn't return 
the expected field in its response.

This leads to the PeerSync completely failing, and results in a full index 
replication from scratch, copying all index files over the network. We noticed 
this happening when we tried to do a rolling upgrade on one of our 6.2.1 
clusters to 6.3. Unfortunately this amount of replication was hammering our 
disks and network, so we had to do a full shutdown, upgrade all to 6.3 and 
restart, which was not ideal for a production cluster.

The attached patch should behave more gracefully in this situation, as it will 
typically return false for alreadyInSync() and then carry on doing the normal 
re-sync based on versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to