Re: Bootstrapping taking long
My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting
Re: Bootstrapping taking long
Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com wrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had
Re: Bootstrapping taking long
In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com wrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so
Re: Bootstrapping taking long
https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory
Re: Bootstrapping taking long
@Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams.
Re: Bootstrapping taking long
Had the same Problem a while ago. Upgrading solved the problem (Don't know if you have to redeploy your cluster though) http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote: @Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent.
Re: Bootstrapping taking long
OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) On Wed, Jan 5, 2011 at 5:42 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Had the same Problem a while ago. Upgrading solved the problem (Don't know if you have to redeploy your cluster though) http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote: @Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its
Re: Bootstrapping taking long
1676 says Avoid dropping messages off the client request path. Bootstrap messages are off the client requst path. So, if some of the nodes involved were loaded enough that they were dropping messages older than RPC_TIMEOUT to cope, it could lose part of the bootstrap communication permanently. On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote: OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Bootstrapping taking long
I see. Thanks for claryfing Jonathan. On Wednesday, January 5, 2011, Jonathan Ellis jbel...@gmail.com wrote: 1676 says Avoid dropping messages off the client request path. Bootstrap messages are off the client requst path. So, if some of the nodes involved were loaded enough that they were dropping messages older than RPC_TIMEOUT to cope, it could lose part of the bootstrap communication permanently. On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote: OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- /Ran
Re: Bootstrapping taking long
In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node and the new node already has just about that so it seems that all data had already been streamed to it successfully, or at least most of the data... So what is it waiting for now? (same question, rephrased... ;) I tried: 1. Restarting the new node. No good. All logs seem normal but at the end the node is still in bootstrap mode. 2. As someone suggested I increased the rpc timeout from 10k to 30k (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the new node. Should I have done that on all (old) nodes as well? Or maybe only on the ones that were supposed to stream data to that node. 3. Logging level at DEBUG now but nothing interesting going on except for occasional messages such as [1] or [2] So the question is: what's keeping the new node from finishing the bootstrap and how can I check its status? Thanks [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36) Disseminating load info ... [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033 StorageService.java (line 1189) computing ranges for 28356863910078205288614550619314017621, 56713727820156410577229101238628035242, 85070591730234615865843651857942052863, 113427455640312821154458202477256070484, 141784319550391026443072753096570088105, 170141183460469231731687303715884105727 -- /Ran
Re: Bootstrapping taking long
Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node and the new node already has just about that so it seems that all data had already been streamed to it successfully, or at least most of the data... So what is it waiting for now? (same question, rephrased... ;) I tried: 1. Restarting the new node. No good. All logs seem normal but at the end the node is still in bootstrap mode. 2. As someone suggested I increased the rpc timeout from 10k to 30k (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the new node. Should I have done that on all (old) nodes as well? Or maybe only on the ones that were supposed to stream data to that node. 3. Logging level at DEBUG now but nothing interesting going on except for occasional messages such as [1] or [2] So the question is: what's keeping the new node from finishing the bootstrap and how can I check its status? Thanks [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36) Disseminating load info ... [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033 StorageService.java (line 1189) computing ranges for 28356863910078205288614550619314017621, 56713727820156410577229101238628035242,
Re: Bootstrapping taking long
Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node?? thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node and the new node already has just about that so it seems that all data had already been streamed to it successfully, or at least most of the data... So what is it waiting for now? (same question, rephrased... ;) I tried: 1. Restarting the new node. No good. All logs seem normal but at the end the node is still in bootstrap mode. 2. As someone suggested I increased the rpc timeout from 10k to 30k (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the new node. Should I have done that on all (old) nodes as well? Or maybe only on the ones that were supposed to stream data to that node. 3.
Re: Bootstrapping taking long
In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node?? thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node and the new node already has just about that so it seems that all data had already been streamed to it successfully, or at least most of the data... So what is it waiting for now? (same question, rephrased... ;) I tried: 1. Restarting the new node. No good. All logs seem normal but at the end the node is still in bootstrap mode. 2.
Re: Bootstrapping taking long
You will have something new to talk about in your talk tomorrow :) You said that the anti compaction was only on a single node? I think that your new node should get data from at least two other nodes (depending on the replication factor). Maybe the problem is not in the new node. In old version (I think prior to 0.6.3) there was case of stuck bootstrap that required restart to the new node and the nodes which were suppose to stream data to it. As far as I remember this case was resolved. I haven't seen this problem since then. Shimi On Tue, Jan 4, 2011 at 3:01 PM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node?? thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node
Re: Bootstrapping taking long
Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node?? thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool
Re: Bootstrapping taking long
I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction.I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. -- /Ran -- /Ran
Re: Bootstrapping taking long
Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction.I don't remember how much data I had in the cluster when I needed to add/remove
Re: Bootstrapping taking long
The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting