[
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839901#comment-17839901
]
Cameron Zemek edited comment on CASSANDRA-19580 at 4/23/24 1:03 AM:
--------------------------------------------------------------------
[~brandon.williams] do you know why it needs to use Hibernate for replacement
for same address? CASSANDRA-8523
added BOOT_REPLACE status. I am not sure what I am breaking by doing this:
{code:java}
public void prepareToJoin() throws ConfigurationException
{
// omitted for brevity
else if (isReplacingSameAddress())
{
//only go into hibernate state if replacing the same
address (CASSANDRA-8523)
logger.warn("Writes will not be forwarded to this node
during replacement because it has the same address as " +
"the node to be replaced ({}). If the previous
node has been down for longer than max_hint_window_in_ms, " +
"repair must be run after the replacement
process in order to make this node consistent.",
DatabaseDescriptor.getReplaceAddress());
appStates.put(ApplicationState.STATUS,
valueFactory.bootReplacing(DatabaseDescriptor.getReplaceAddress()));
}{code}
This stops the issue as no longer putting the node into hibernate during
replacement. So if the replacement fails not in a dead state.
was (Author: cam1982):
[~brandon.williams] do you know why it needs to use Hibernate for replacement
for same address. CASSANDRA-8523
added BOOT_REPLACE status. I am not sure what I am breaking by doing this:
{code:java}
public void prepareToJoin() throws ConfigurationException
{
// omitted for brevity
else if (isReplacingSameAddress())
{
//only go into hibernate state if replacing the same
address (CASSANDRA-8523)
logger.warn("Writes will not be forwarded to this node
during replacement because it has the same address as " +
"the node to be replaced ({}). If the previous
node has been down for longer than max_hint_window_in_ms, " +
"repair must be run after the replacement
process in order to make this node consistent.",
DatabaseDescriptor.getReplaceAddress());
appStates.put(ApplicationState.STATUS,
valueFactory.bootReplacing(DatabaseDescriptor.getReplaceAddress()));
}{code}
This stops the issue as no longer putting the node into hibernate during
replacement. So if the replacement fails not in a dead state.
> Unable to contact any seeds with node in hibernate status
> ---------------------------------------------------------
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
> Issue Type: Bug
> Reporter: Cameron Zemek
> Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I
> have been able to reproduce this issue if I kill Cassandra as its joining
> which will put the node into hibernate status. Once a node is in hibernate it
> will no longer receive any SYN messages from other nodes during startup and
> as it sends only itself as digest in outbound SYN messages it never receives
> any states in any of the ACK replies. So once it gets to the check
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>
> A workaround is copying the system.peers table from other node but this is
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
> /* Possibly gossip to a seed for facilitating partition healing */
> private void maybeGossipToSeed(MessageOut<GossipDigestSyn> prod)
> {
> int size = seeds.size();
> if (size > 0)
> {
> if (size == 1 &&
> seeds.contains(FBUtilities.getBroadcastAddress()))
> {
> return;
> }
> if (liveEndpoints.size() == 0)
> {
> List<GossipDigest> gDigests = prod.payload.gDigests;
> if (gDigests.size() == 1 &&
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
> {
> gDigests = new ArrayList<GossipDigest>();
> GossipDigestSyn digestSynMessage = new
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>
> DatabaseDescriptor.getPartitionerName(),
>
> gDigests);
> MessageOut<GossipDigestSyn> message = new
> MessageOut<GossipDigestSyn>(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>
> digestSynMessage,
>
> GossipDigestSyn.serializer);
> sendGossip(message, seeds);
> }
> else
> {
> sendGossip(prod, seeds);
> }
> }
> else
> {
> /* Gossip with the seed with some probability. */
> double probability = seeds.size() / (double)
> (liveEndpoints.size() + unreachableEndpoints.size());
> double randDbl = random.nextDouble();
> if (randDbl <= probability)
> sendGossip(prod, seeds);
> }
> }
> }
> {code}
> Only problem is this is the same as SYN from shadow round. It does resolve
> the issue however as then receive an ACK with all the states.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]