Chris Burroughs created CASSANDRA-5914:
------------------------------------------

             Summary: Failed replace_node bootstrap leaves gossip in weird 
state ; possible perf problem
                 Key: CASSANDRA-5914
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5914
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: 1.2.8
            Reporter: Chris Burroughs


A node was down for a week or two due to hardware disk failure.  I tried to use 
replace_node to bring up a new node on the same physical host with the same 
IPs.  (rbranson suspected that using the same IP may be issue prone.)  This 
failed due to "unable to find sufficient sources for streaming range".  
However, gossip for the to-be-replaced node was left in a funky state:

{noformat}
/64.215.255.182
  RACK:NOP
  NET_VERSION:6
  HOST_ID:4f3b214b-b03e-46eb-8214-5fab2662a06b
  RELEASE_VERSION:1.2.8
  DC:IAD
  INTERNAL_IP:10.15.2.182
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  RPC_ADDRESS:0.0.0.0
{noformat}

(See CASSANDRA-5913 for cosmetic issue with nt:status.)

This seems (A) confusing and (B) the failed replace_token correlated with 95th 
percentile read latency for this cluster going from 8k microseconds to around 
200k microseconds (on both DCs in a mutli-dc cluster reading at CL.ONE).  I 
don't have a good theory for the correlation but performance was bad for over 
an hour and returned to normal once a successful replace_token was performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to