[jira] [Comment Edited] (CASSANDRA-11983) Migration task failed to complete

Stefan Podkowinski (JIRA) Mon, 20 Feb 2017 08:43:58 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874775#comment-15874775
 ]


Stefan Podkowinski edited comment on CASSANDRA-11983 at 2/20/17 4:42 PM:
-------------------------------------------------------------------------

bq. Strongly suspect this is a duplicate of CASSANDRA-12653 , which is 
patch-available, for anyone who is desperate for a fix (should be reviewed and 
committed soon).

[~jjirsa], I'm not really sure. First of all, I can't really think of why tasks 
accidentally triggered during gossip shadow round would not be able to 
complete. Migration tasks are spawned for each node discovered by gossip and 
contacting all of them will make the startup process slower the bigger the 
cluster grows. The changes in CASSANDRA-12653 may help here as the endpoint 
state maps won't be reset any longer, which is something that might be relevant 
for this ticket here, as in worst case migration tasks would be fired twice for 
all nodes, 1x accidentally triggered by shadow round and 1x during regular 
gossip again after clearing the endpoint states. But in all cases, the startup 
process should not grind down to an halt for minutes due to this.

-It would also be interesting to know for this ticket if the test cluster has 
been configured with each node being a seed, or just a limited number of seed 
nodes. The gossip shadow round will contact all seeds, which is something we 
probably have to reconsider, in case we want to support clusters with hundreds 
of seed nodes.- Obviously not being the case here by looking at the log.


was (Author: spo...@gmail.com):
bq. Strongly suspect this is a duplicate of CASSANDRA-12653 , which is 
patch-available, for anyone who is desperate for a fix (should be reviewed and 
committed soon).

[~jjirsa], I'm not really sure. First of all, I can't really think of why tasks 
accidentally triggered during gossip shadow round would not be able to 
complete. Migration tasks are spawned for each node discovered by gossip and 
contacting all of them will make the startup process slower the bigger the 
cluster grows. The changes in CASSANDRA-12653 may help here as the endpoint 
state maps won't be reset any longer, which is something that might be relevant 
for this ticket here, as in worst case migration tasks would be fired twice for 
all nodes, 1x accidentally triggered by shadow round and 1x during regular 
gossip again after clearing the endpoint states. But in all cases, the startup 
process should not grind down to an halt for minutes due to this.

It would also be interesting to know for this ticket if the test cluster has 
been configured with each node being a seed, or just a limited number of seed 
nodes. The gossip shadow round will contact all seeds, which is something we 
probably have to reconsider, in case we want to support clusters with hundreds 
of seed nodes.

> Migration task failed to complete
> ---------------------------------
>
>                 Key: CASSANDRA-11983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11983
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Lifecycle
>         Environment: Docker / Kubernetes running
> Linux cassandra-21 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) 
> x86_64 GNU/Linux
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> Cassnadra 3.5 installed from 
> deb-src http://www.apache.org/dist/cassandra/debian 35x main
>            Reporter: Chris Love
>            Assignee: Jeff Jirsa
>             Fix For: 3.0.x, 3.11.x
>
>         Attachments: cass.log
>
>
> When nodes are boostrapping I am getting mulitple errors: "Migration task 
> failed to complete", from MigrationManager.java
> The errors increase as more nodes are added to the ring, as I am creating a 
> ring of 1k nodes.
> Cassandra yaml i here 
> https://github.com/k8s-for-greeks/gpmr/blob/3d50ff91a139b9c4a7a26eda0fb4dcf9a008fbed/pet-race-devops/docker/cassandra-debian/files/cassandra.yaml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (CASSANDRA-11983) Migration task failed to complete

Reply via email to