subject:"Re\: Bootstrapping taking long"

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn

My nodes all have themselves in their list of seeds - always did - and
everything works. (You may ask why I did this. I don't know, I must have
copied it from an example somewhere.)

On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:

 I was able to make the node join the ring but I'm confused.
 What I did is, first when adding the node, this node was not in the seeds
 list of itself. AFAIK this is how it's supposed to be. So it was able to
 transfer all data to itself from other nodes but then it stayed in the
 bootstrapping state.
 So what I did (and I don't know why it works), is add this node to the
 seeds list in its own storage-conf.xml file. Then restart the server and
 then I finally see it in the ring...
 If I had added the node to the seeds list of itself when first joining it,
 it would not join the ring but if I do it in two phases it did work.
 So it's either my misunderstanding or a bug...


 On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote:

 The new node does not see itself as part of the ring, it sees all others
 but itself, so from that perspective the view is consistent.
 The only problem is that the node never finishes to bootstrap. It stays in
 this state for hours (It's been 20 hours now...)


 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote:

 Does the new node have itself in the list of seeds per chance? This
 could cause some issues if so.

 On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
  I'm still at lost.   I haven't been able to resolve this. I tried
  adding another node at a different location on the ring but this node
  too remains stuck in the bootstrapping state for many hours without
  any of the other nodes being busy with anti compaction or anything
  else. I don't know what's keeping it from finishing the bootstrap,no
  CPU, no io, files were already streamed so what is it waiting for?
  I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
  be anything addressing a similar issue so I figured there was no point
  in upgrading. But let me know if you think there is.
  Or any other advice...
 
  On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
  Thanks Jake, but unfortunately the streams directory is empty so I
 don't think that any of the nodes is anti-compacting data right now or had
 been in the past 5 hours. It seems that all the data was already transferred
 to the joining host but the joining node, after having received the data
 would still remain in bootstrapping mode and not join the cluster. I'm not
 sure that *all* data was transferred (perhaps other nodes need to transfer
 more data) but nothing is actually happening so I assume all has been moved.
  Perhaps it's a configuration error from my part. Should I use I use
 AutoBootstrap=true ? Anything else I should look out for in the
 configuration file or something else?
 
 
  On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com
 wrote:
 
  In 0.6, locate the node doing anti-compaction and look in the
 streams subdirectory in the keyspace data dir to monitor the
 anti-compaction progress (it puts new SSTables for bootstrapping node in
 there)
 
 
  On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:
 
 
  Running nodetool decommission didn't help. Actually the node refused
 to decommission itself (b/c it wasn't part of the ring). So I simply stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).
 
 
 
 
  Any hints how to analyze a stuck bootstrapping node??thanks
  On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
  Thanks Shimi, so indeed anticompaction was run on one of the other
 nodes from the same DC but to my understanding it has already ended. A few
 hour ago...
 
 
 
  I plenty of log messages such as [1] which ended a couple of hours
 ago, and I've seen the new node streaming and accepting the data from the
 node which performed the anticompaction and so far it was normal so it
 seemed that data is at its right place. But now the new node seems sort of
 stuck. None of the other nodes is anticompacting right now or had been
 anticompacting since then.
 
 
 
 
  The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.
  On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...
 
 
 
 
 
 
  [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
 CompactionManager.java (line 338) AntiCompacting

Re: Bootstrapping taking long

2011-01-05 Thread Jake Luciani

Well your ring issues don't make sense to me, seed list should be the same
across the cluster.
I'm just thinking of other things to try, non-boostrapped nodes should join
the ring instantly but reads will fail if you aren't using quorum.


On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:

 I haven't tried repair.  Should I?
 On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
  Have you tried not bootstrapping but setting the token and manually
 calling
  repair?
 
  On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote:
 
  My conclusion is lame: I tried this on several hosts and saw the same
  behavior, the only way I was able to join new nodes was to first start
 them
  when they are *not in* their own seeds list and after they
  finish transferring the data, then restart them with themselves *in*
 their
  own seeds list. After doing that the node would join the ring.
  This is either my misunderstanding or a bug, but the only place I found
 it
  documented stated that the new node should not be in its own seeds list.
  Version 0.6.6.
 
  On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com
 wrote:
 
  My nodes all have themselves in their list of seeds - always did - and
  everything works. (You may ask why I did this. I don't know, I must
 have
  copied it from an example somewhere.)
 
  On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:
 
  I was able to make the node join the ring but I'm confused.
  What I did is, first when adding the node, this node was not in the
 seeds
  list of itself. AFAIK this is how it's supposed to be. So it was able
 to
  transfer all data to itself from other nodes but then it stayed in the
  bootstrapping state.
  So what I did (and I don't know why it works), is add this node to the
  seeds list in its own storage-conf.xml file. Then restart the server
 and
  then I finally see it in the ring...
  If I had added the node to the seeds list of itself when first joining
  it, it would not join the ring but if I do it in two phases it did
 work.
  So it's either my misunderstanding or a bug...
 
 
  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote:
 
  The new node does not see itself as part of the ring, it sees all
 others
  but itself, so from that perspective the view is consistent.
  The only problem is that the node never finishes to bootstrap. It
 stays
  in this state for hours (It's been 20 hours now...)
 
 
  $ bin/nodetool -p 9004 -h localhost streams
  Mode: Bootstrapping
  Not sending any streams.
  Not receiving any streams.
 
 
  On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com
 wrote:
 
  Does the new node have itself in the list of seeds per chance? This
  could cause some issues if so.
 
  On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com
 wrote:
   I'm still at lost. I haven't been able to resolve this. I tried
   adding another node at a different location on the ring but this
 node
   too remains stuck in the bootstrapping state for many hours
 without
   any of the other nodes being busy with anti compaction or anything
   else. I don't know what's keeping it from finishing the
 bootstrap,no
   CPU, no io, files were already streamed so what is it waiting for?
   I read the release notes of 0.6.7 and 0.6.8 and there didn't seem
 to
   be anything addressing a similar issue so I figured there was no
  point
   in upgrading. But let me know if you think there is.
   Or any other advice...
  
   On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
   Thanks Jake, but unfortunately the streams directory is empty so
 I
  don't think that any of the nodes is anti-compacting data right now
 or had
  been in the past 5 hours. It seems that all the data was already
 transferred
  to the joining host but the joining node, after having received the
 data
  would still remain in bootstrapping mode and not join the cluster.
 I'm not
  sure that *all* data was transferred (perhaps other nodes need to
 transfer
  more data) but nothing is actually happening so I assume all has
 been moved.
   Perhaps it's a configuration error from my part. Should I use I
 use
  AutoBootstrap=true ? Anything else I should look out for in the
  configuration file or something else?
  
  
   On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com
  wrote:
  
   In 0.6, locate the node doing anti-compaction and look in the
  streams subdirectory in the keyspace data dir to monitor the
  anti-compaction progress (it puts new SSTables for bootstrapping
 node in
  there)
  
  
   On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com
  wrote:
  
  
   Running nodetool decommission didn't help. Actually the node
 refused
  to decommission itself (b/c it wasn't part of the ring). So I simply
 stopped
  the process, deleted all the data directories and started it again.
 It
  worked in the sense of the node bootstrapped again but as before,
 after it
  had

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

In storage-conf I see this comment [1] from which I understand that the
recommended way to bootstrap a new node is to set AutoBootstrap=true and
remove itself from the seeds list.
Moreover, I did try to set AutoBootstrap=true and have the node in its own
seeds list, but it would not bootstrap. I don't recall the exact message but
it was something like I found myself in the seeds list therefore I'm not
going to bootstrap even though AutoBootstrap is true.

[1]
  !--
   ~ Turn on to make new [non-seed] nodes automatically migrate the right
data
   ~ to themselves.  (If no InitialToken is specified, they will pick one
   ~ such that they will get half the range of the most-loaded node.)
   ~ If a node starts up without bootstrapping, it will mark itself
bootstrapped
   ~ so that you can't subsequently accidently bootstrap a node with
   ~ data on it.  (You can reset this by wiping your data and commitlog
   ~ directories.)
   ~
   ~ Off by default so that new clusters and upgraders from 0.4 don't
   ~ bootstrap immediately.  You should turn this on when you start adding
   ~ new nodes to a cluster that already has data on it.  (If you are
upgrading
   ~ from 0.4, start your cluster with it off once before changing it to
true.
   ~ Otherwise, no data will be lost but you will incur a lot of unnecessary
   ~ I/O before your cluster starts up.)
  --
  AutoBootstrapfalse/AutoBootstrap

On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote:

 If seed list should be the same across the cluster that means that nodes
 *should* have themselves as a seed. If that doesn't work for Ran, then that
 is the first problem, no?


 On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote:

 Well your ring issues don't make sense to me, seed list should be the same
 across the cluster.
 I'm just thinking of other things to try, non-boostrapped nodes should
 join the ring instantly but reads will fail if you aren't using quorum.


 On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:

 I haven't tried repair.  Should I?
 On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
  Have you tried not bootstrapping but setting the token and manually
 calling
  repair?
 
  On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote:
 
  My conclusion is lame: I tried this on several hosts and saw the same
  behavior, the only way I was able to join new nodes was to first start
 them
  when they are *not in* their own seeds list and after they
  finish transferring the data, then restart them with themselves *in*
 their
  own seeds list. After doing that the node would join the ring.
  This is either my misunderstanding or a bug, but the only place I
 found it
  documented stated that the new node should not be in its own seeds
 list.
  Version 0.6.6.
 
  On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com
 wrote:
 
  My nodes all have themselves in their list of seeds - always did -
 and
  everything works. (You may ask why I did this. I don't know, I must
 have
  copied it from an example somewhere.)
 
  On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:
 
  I was able to make the node join the ring but I'm confused.
  What I did is, first when adding the node, this node was not in the
 seeds
  list of itself. AFAIK this is how it's supposed to be. So it was
 able to
  transfer all data to itself from other nodes but then it stayed in
 the
  bootstrapping state.
  So what I did (and I don't know why it works), is add this node to
 the
  seeds list in its own storage-conf.xml file. Then restart the server
 and
  then I finally see it in the ring...
  If I had added the node to the seeds list of itself when first
 joining
  it, it would not join the ring but if I do it in two phases it did
 work.
  So it's either my misunderstanding or a bug...
 
 
  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com
 wrote:
 
  The new node does not see itself as part of the ring, it sees all
 others
  but itself, so from that perspective the view is consistent.
  The only problem is that the node never finishes to bootstrap. It
 stays
  in this state for hours (It's been 20 hours now...)
 
 
  $ bin/nodetool -p 9004 -h localhost streams
  Mode: Bootstrapping
  Not sending any streams.
  Not receiving any streams.
 
 
  On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com
 wrote:
 
  Does the new node have itself in the list of seeds per chance?
 This
  could cause some issues if so.
 
  On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com
 wrote:
   I'm still at lost. I haven't been able to resolve this. I tried
   adding another node at a different location on the ring but this
 node
   too remains stuck in the bootstrapping state for many hours
 without
   any of the other nodes being busy with anti compaction or
 anything
   else. I don't know what's keeping it from finishing the
 bootstrap,no
   CPU, no io, files were already streamed so

Re: Bootstrapping taking long

2011-01-05 Thread Thibaut Britz

https://issues.apache.org/jira/browse/CASSANDRA-1676

you have to use at least 0.6.7


On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote:
  In storage-conf I see this comment [1] from which I understand that the
  recommended way to bootstrap a new node is to set AutoBootstrap=true and
  remove itself from the seeds list.
  Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
  seeds list, but it would not bootstrap. I don't recall the exact message
 but
  it was something like I found myself in the seeds list therefore I'm not
  going to bootstrap even though AutoBootstrap is true.
 
  [1]
!--
 ~ Turn on to make new [non-seed] nodes automatically migrate the right
  data
 ~ to themselves.  (If no InitialToken is specified, they will pick one
 ~ such that they will get half the range of the most-loaded node.)
 ~ If a node starts up without bootstrapping, it will mark itself
  bootstrapped
 ~ so that you can't subsequently accidently bootstrap a node with
 ~ data on it.  (You can reset this by wiping your data and commitlog
 ~ directories.)
 ~
 ~ Off by default so that new clusters and upgraders from 0.4 don't
 ~ bootstrap immediately.  You should turn this on when you start
 adding
 ~ new nodes to a cluster that already has data on it.  (If you are
  upgrading
 ~ from 0.4, start your cluster with it off once before changing it to
  true.
 ~ Otherwise, no data will be lost but you will incur a lot of
 unnecessary
 ~ I/O before your cluster starts up.)
--
AutoBootstrapfalse/AutoBootstrap
  On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  If seed list should be the same across the cluster that means that
 nodes
  *should* have themselves as a seed. If that doesn't work for Ran, then
 that
  is the first problem, no?
 
 
  On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote:
 
  Well your ring issues don't make sense to me, seed list should be the
  same across the cluster.
  I'm just thinking of other things to try, non-boostrapped nodes should
  join the ring instantly but reads will fail if you aren't using quorum.
 
  On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
   Have you tried not bootstrapping but setting the token and manually
   calling
   repair?
  
   On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   My conclusion is lame: I tried this on several hosts and saw the
 same
   behavior, the only way I was able to join new nodes was to first
   start them
   when they are *not in* their own seeds list and after they
   finish transferring the data, then restart them with themselves
 *in*
   their
   own seeds list. After doing that the node would join the ring.
   This is either my misunderstanding or a bug, but the only place I
   found it
   documented stated that the new node should not be in its own seeds
   list.
   Version 0.6.6.
  
   On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
   da...@lookin2.comwrote:
  
   My nodes all have themselves in their list of seeds - always did -
   and
   everything works. (You may ask why I did this. I don't know, I
 must
   have
   copied it from an example somewhere.)
  
   On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not in
 the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it was
   able to
   transfer all data to itself from other nodes but then it stayed
 in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this node
 to
   the
   seeds list in its own storage-conf.xml file. Then restart the
   server and
   then I finally see it in the ring...
   If I had added the node to the seeds list of itself when first
   joining
   it, it would not join the ring but if I do it in two phases it
 did
   work.
   So it's either my misunderstanding or a bug...
  
  
   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com
   wrote:
  
   The new node does not see itself as part of the ring, it sees
 all
   others
   but itself, so from that perspective the view is consistent.
   The only problem is that the node never finishes to bootstrap.
 It
   stays
   in this state for hours (It's been 20 hours now...)
  
  
   $ bin/nodetool -p 9004 -h localhost streams
   Mode: Bootstrapping
   Not sending any streams.
   Not receiving any streams.
  
  
   On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com
   wrote:
  
   Does the new node have itself in the list of seeds per chance?
   This
   could cause some issues if so.
  
   On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

@Thibaut wrong email? Or how's Avoid dropping messages off the client
request path (CASSANDRA-1676) related to the bootstrap questions I had?

On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com
 wrote:

 https://issues.apache.org/jira/browse/CASSANDRA-1676

 you have to use at least 0.6.7



 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote:
  In storage-conf I see this comment [1] from which I understand that the
  recommended way to bootstrap a new node is to set AutoBootstrap=true and
  remove itself from the seeds list.
  Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
  seeds list, but it would not bootstrap. I don't recall the exact message
 but
  it was something like I found myself in the seeds list therefore I'm
 not
  going to bootstrap even though AutoBootstrap is true.
 
  [1]
!--
 ~ Turn on to make new [non-seed] nodes automatically migrate the
 right
  data
 ~ to themselves.  (If no InitialToken is specified, they will pick
 one
 ~ such that they will get half the range of the most-loaded node.)
 ~ If a node starts up without bootstrapping, it will mark itself
  bootstrapped
 ~ so that you can't subsequently accidently bootstrap a node with
 ~ data on it.  (You can reset this by wiping your data and commitlog
 ~ directories.)
 ~
 ~ Off by default so that new clusters and upgraders from 0.4 don't
 ~ bootstrap immediately.  You should turn this on when you start
 adding
 ~ new nodes to a cluster that already has data on it.  (If you are
  upgrading
 ~ from 0.4, start your cluster with it off once before changing it to
  true.
 ~ Otherwise, no data will be lost but you will incur a lot of
 unnecessary
 ~ I/O before your cluster starts up.)
--
AutoBootstrapfalse/AutoBootstrap
  On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  If seed list should be the same across the cluster that means that
 nodes
  *should* have themselves as a seed. If that doesn't work for Ran, then
 that
  is the first problem, no?
 
 
  On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote:
 
  Well your ring issues don't make sense to me, seed list should be the
  same across the cluster.
  I'm just thinking of other things to try, non-boostrapped nodes should
  join the ring instantly but reads will fail if you aren't using
 quorum.
 
  On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
   Have you tried not bootstrapping but setting the token and manually
   calling
   repair?
  
   On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   My conclusion is lame: I tried this on several hosts and saw the
 same
   behavior, the only way I was able to join new nodes was to first
   start them
   when they are *not in* their own seeds list and after they
   finish transferring the data, then restart them with themselves
 *in*
   their
   own seeds list. After doing that the node would join the ring.
   This is either my misunderstanding or a bug, but the only place I
   found it
   documented stated that the new node should not be in its own seeds
   list.
   Version 0.6.6.
  
   On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
   da...@lookin2.comwrote:
  
   My nodes all have themselves in their list of seeds - always did
 -
   and
   everything works. (You may ask why I did this. I don't know, I
 must
   have
   copied it from an example somewhere.)
  
   On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not in
 the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it was
   able to
   transfer all data to itself from other nodes but then it stayed
 in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this node
 to
   the
   seeds list in its own storage-conf.xml file. Then restart the
   server and
   then I finally see it in the ring...
   If I had added the node to the seeds list of itself when first
   joining
   it, it would not join the ring but if I do it in two phases it
 did
   work.
   So it's either my misunderstanding or a bug...
  
  
   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com
   wrote:
  
   The new node does not see itself as part of the ring, it sees
 all
   others
   but itself, so from that perspective the view is consistent.
   The only problem is that the node never finishes to bootstrap.
 It
   stays
   in this state for hours (It's been 20 hours now...)
  
  
   $ bin/nodetool -p 9004 -h localhost streams
   Mode: Bootstrapping
   Not sending any streams.
   Not receiving any streams.

Re: Bootstrapping taking long

2011-01-05 Thread Thibaut Britz

Had the same Problem a while ago. Upgrading solved the problem (Don't know
if you have to redeploy your cluster though)

http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html


On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote:

 @Thibaut wrong email? Or how's Avoid dropping messages off the client
 request path (CASSANDRA-1676) related to the bootstrap questions I had?


 On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz 
 thibaut.br...@trendiction.com wrote:

 https://issues.apache.org/jira/browse/CASSANDRA-1676

 you have to use at least 0.6.7



 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote:
  In storage-conf I see this comment [1] from which I understand that the
  recommended way to bootstrap a new node is to set AutoBootstrap=true
 and
  remove itself from the seeds list.
  Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
  seeds list, but it would not bootstrap. I don't recall the exact
 message but
  it was something like I found myself in the seeds list therefore I'm
 not
  going to bootstrap even though AutoBootstrap is true.
 
  [1]
!--
 ~ Turn on to make new [non-seed] nodes automatically migrate the
 right
  data
 ~ to themselves.  (If no InitialToken is specified, they will pick
 one
 ~ such that they will get half the range of the most-loaded node.)
 ~ If a node starts up without bootstrapping, it will mark itself
  bootstrapped
 ~ so that you can't subsequently accidently bootstrap a node with
 ~ data on it.  (You can reset this by wiping your data and commitlog
 ~ directories.)
 ~
 ~ Off by default so that new clusters and upgraders from 0.4 don't
 ~ bootstrap immediately.  You should turn this on when you start
 adding
 ~ new nodes to a cluster that already has data on it.  (If you are
  upgrading
 ~ from 0.4, start your cluster with it off once before changing it
 to
  true.
 ~ Otherwise, no data will be lost but you will incur a lot of
 unnecessary
 ~ I/O before your cluster starts up.)
--
AutoBootstrapfalse/AutoBootstrap
  On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  If seed list should be the same across the cluster that means that
 nodes
  *should* have themselves as a seed. If that doesn't work for Ran, then
 that
  is the first problem, no?
 
 
  On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com
 wrote:
 
  Well your ring issues don't make sense to me, seed list should be the
  same across the cluster.
  I'm just thinking of other things to try, non-boostrapped nodes
 should
  join the ring instantly but reads will fail if you aren't using
 quorum.
 
  On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
   Have you tried not bootstrapping but setting the token and
 manually
   calling
   repair?
  
   On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   My conclusion is lame: I tried this on several hosts and saw the
 same
   behavior, the only way I was able to join new nodes was to first
   start them
   when they are *not in* their own seeds list and after they
   finish transferring the data, then restart them with themselves
 *in*
   their
   own seeds list. After doing that the node would join the ring.
   This is either my misunderstanding or a bug, but the only place I
   found it
   documented stated that the new node should not be in its own
 seeds
   list.
   Version 0.6.6.
  
   On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
   da...@lookin2.comwrote:
  
   My nodes all have themselves in their list of seeds - always did
 -
   and
   everything works. (You may ask why I did this. I don't know, I
 must
   have
   copied it from an example somewhere.)
  
   On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not in
 the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it
 was
   able to
   transfer all data to itself from other nodes but then it stayed
 in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this node
 to
   the
   seeds list in its own storage-conf.xml file. Then restart the
   server and
   then I finally see it in the ring...
   If I had added the node to the seeds list of itself when first
   joining
   it, it would not join the ring but if I do it in two phases it
 did
   work.
   So it's either my misunderstanding or a bug...
  
  
   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com
   wrote:
  
   The new node does not see itself as part of the ring, it sees
 all
   others
   but itself, so from that perspective the view is consistent.

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

OK, thanks, so I see we had the same problem (I too had multiple keyspace,
not that I know why it matters to the problem at hand) and I see that by
upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
workaround) but frankly, I don't understand how
https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the
stuck bootstrap problem (I'm not saying that it isn't, I'd just like to
understand why...)


On Wed, Jan 5, 2011 at 5:42 PM, Thibaut Britz thibaut.br...@trendiction.com
 wrote:

 Had the same Problem a while ago. Upgrading solved the problem (Don't know
 if you have to redeploy your cluster though)

 http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html



 On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote:

 @Thibaut wrong email? Or how's Avoid dropping messages off the client
 request path (CASSANDRA-1676) related to the bootstrap questions I had?


 On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz 
 thibaut.br...@trendiction.com wrote:

 https://issues.apache.org/jira/browse/CASSANDRA-1676

 you have to use at least 0.6.7



 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote:
  In storage-conf I see this comment [1] from which I understand that
 the
  recommended way to bootstrap a new node is to set AutoBootstrap=true
 and
  remove itself from the seeds list.
  Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
  seeds list, but it would not bootstrap. I don't recall the exact
 message but
  it was something like I found myself in the seeds list therefore I'm
 not
  going to bootstrap even though AutoBootstrap is true.
 
  [1]
!--
 ~ Turn on to make new [non-seed] nodes automatically migrate the
 right
  data
 ~ to themselves.  (If no InitialToken is specified, they will pick
 one
 ~ such that they will get half the range of the most-loaded node.)
 ~ If a node starts up without bootstrapping, it will mark itself
  bootstrapped
 ~ so that you can't subsequently accidently bootstrap a node with
 ~ data on it.  (You can reset this by wiping your data and
 commitlog
 ~ directories.)
 ~
 ~ Off by default so that new clusters and upgraders from 0.4 don't
 ~ bootstrap immediately.  You should turn this on when you start
 adding
 ~ new nodes to a cluster that already has data on it.  (If you are
  upgrading
 ~ from 0.4, start your cluster with it off once before changing it
 to
  true.
 ~ Otherwise, no data will be lost but you will incur a lot of
 unnecessary
 ~ I/O before your cluster starts up.)
--
AutoBootstrapfalse/AutoBootstrap
  On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  If seed list should be the same across the cluster that means that
 nodes
  *should* have themselves as a seed. If that doesn't work for Ran,
 then that
  is the first problem, no?
 
 
  On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com
 wrote:
 
  Well your ring issues don't make sense to me, seed list should be
 the
  same across the cluster.
  I'm just thinking of other things to try, non-boostrapped nodes
 should
  join the ring instantly but reads will fail if you aren't using
 quorum.
 
  On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com
 wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
   Have you tried not bootstrapping but setting the token and
 manually
   calling
   repair?
  
   On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   My conclusion is lame: I tried this on several hosts and saw the
 same
   behavior, the only way I was able to join new nodes was to first
   start them
   when they are *not in* their own seeds list and after they
   finish transferring the data, then restart them with themselves
 *in*
   their
   own seeds list. After doing that the node would join the ring.
   This is either my misunderstanding or a bug, but the only place
 I
   found it
   documented stated that the new node should not be in its own
 seeds
   list.
   Version 0.6.6.
  
   On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
   da...@lookin2.comwrote:
  
   My nodes all have themselves in their list of seeds - always
 did -
   and
   everything works. (You may ask why I did this. I don't know, I
 must
   have
   copied it from an example somewhere.)
  
   On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not
 in the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it
 was
   able to
   transfer all data to itself from other nodes but then it
 stayed in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this
 node to
   the
   seeds list in its

Re: Bootstrapping taking long

2011-01-05 Thread Jonathan Ellis

1676 says Avoid dropping messages off the client request path.
Bootstrap messages are off the client requst path.  So, if some of
the nodes involved were loaded enough that they were dropping messages
older than RPC_TIMEOUT to cope, it could lose part of the bootstrap
communication permanently.

On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote:
 OK, thanks, so I see we had the same problem (I too had multiple keyspace,
 not that I know why it matters to the problem at hand) and I see that by
 upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
 workaround) but frankly, I don't understand
 how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the
 the stuck bootstrap problem (I'm not saying that it isn't, I'd just like
 to understand why...)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

I see. Thanks for claryfing Jonathan.

On Wednesday, January 5, 2011, Jonathan Ellis jbel...@gmail.com wrote:
 1676 says Avoid dropping messages off the client request path.
 Bootstrap messages are off the client requst path.  So, if some of
 the nodes involved were loaded enough that they were dropping messages
 older than RPC_TIMEOUT to cope, it could lose part of the bootstrap
 communication permanently.

 On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote:
 OK, thanks, so I see we had the same problem (I too had multiple keyspace,
 not that I know why it matters to the problem at hand) and I see that by
 upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
 workaround) but frankly, I don't understand
 how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the
 the stuck bootstrap problem (I'm not saying that it isn't, I'd just like
 to understand why...)

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com


-- 
/Ran

Re: Bootstrapping taking long

2011-01-04 Thread shimi

In my experience most of the time it takes for a node to join the cluster is
the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi


On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's asleep
 ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for too
 long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. As someone suggested I increased the rpc timeout from 10k to 30k
 (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
 new node. Should I have done that on all (old) nodes as well? Or maybe only
 on the ones that were supposed to stream data to that node.
 3. Logging level at DEBUG now but nothing interesting going on except
 for occasional messages such as [1] or [2]

 So the question is: what's keeping the new node from finishing the
 bootstrap and how can I check its status?
 Thanks

 [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36)
 Disseminating load info ...
 [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
 StorageService.java (line 1189) computing ranges for
 28356863910078205288614550619314017621,
 56713727820156410577229101238628035242,
  85070591730234615865843651857942052863,
 113427455640312821154458202477256070484,
 141784319550391026443072753096570088105,
 170141183460469231731687303715884105727

 --
 /Ran

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since
then.
The new node's CPU is close to zero, it's iostats are almost zero so I can't
find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...


[1]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the cluster
 is the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's asleep
 ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for too
 long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. As someone suggested I increased the rpc timeout from 10k to 30k
 (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
 new node. Should I have done that on all (old) nodes as well? Or maybe only
 on the ones that were supposed to stream data to that node.
 3. Logging level at DEBUG now but nothing interesting going on except
 for occasional messages such as [1] or [2]

 So the question is: what's keeping the new node from finishing the
 bootstrap and how can I check its status?
 Thanks

 [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line
 36) Disseminating load info ...
 [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
 StorageService.java (line 1189) computing ranges for
 28356863910078205288614550619314017621,
 56713727820156410577229101238628035242,

Running nodetool decommission didn't help. Actually the node refused to
decommission itself (b/c it wasn't part of the ring). So I simply stopped
the process, deleted all the data directories and started it again. It
worked in the sense of the node bootstrapped again but as before, after it
had finished moving the data nothing happened for a long time (I'm still
waiting, but nothing seems to be happening).

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since
then.
The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

[1]
INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

In my experience most of the time it takes for a node to join the cluster
is the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing
anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi

On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

I asked the same question on the IRC but no luck there, everyone's asleep
;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for too
long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

$ bin/nodetool -p 9004 -h localhost info
51042355038140769519506191114765231716
Load : 22.49 GB
Generation No: 1294133781
Uptime (seconds) : 1795
Heap Memory (MB) : 315.31 / 6117.00

nodetool ring does not list this new node in the ring, although nodetool
can happily talk to the new node, it's just not listing itself as a member
of the ring. This is expected when the node is still bootstrapping, so the
question is still how long might the bootstrap take and whether is it stuck.

The data ins't huge so I find it hard to believe that streaming or anti
compaction are the bottlenecks. I have ~20G on each node and the new node
already has just about that so it seems that all data had already been
streamed to it successfully, or at least most of the data... So what is it
waiting for now? (same question, rephrased... ;)

I tried:
1. Restarting the new node. No good. All logs seem normal but at the end
the node is still in bootstrap mode.
2. As someone suggested I increased the rpc timeout from 10k to 30k
(RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
new node. Should I have done that on all (old) nodes as well? Or maybe only
on the ones that were supposed to stream data to that node.
3.

Re: Bootstrapping taking long

2011-01-04 Thread Jake Luciani

In 0.6, locate the node doing anti-compaction and look in the streams
subdirectory in the keyspace data dir to monitor the anti-compaction
progress (it puts new SSTables for bootstrapping node in there)

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago,
and I've seen the new node streaming and accepting the data from the node
which performed the anticompaction and so far it was normal so it seemed
that data is at its right place. But now the new node seems sort of stuck.
None of the other nodes is anticompacting right now or had been
anticompacting since then.
The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

The node will join the ring only when it will finish the bootstrap.

Shimi

On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

I asked the same question on the IRC but no luck there, everyone's
asleep ;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for
too long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

$ bin/nodetool -p 9004 -h localhost info
51042355038140769519506191114765231716
Load : 22.49 GB
Generation No: 1294133781
Uptime (seconds) : 1795
Heap Memory (MB) : 315.31 / 6117.00

I tried:
1. Restarting the new node. No good. All logs seem normal but at the end
the node is still in bootstrap mode.
2.

Re: Bootstrapping taking long

2011-01-04 Thread shimi

You will have something new to talk about in your talk tomorrow :)

You said that the anti compaction was only on a single node? I think that
your new node should get data from at least two other nodes (depending on
the replication factor). Maybe the problem is not in the new node.
In old version (I think prior to 0.6.3) there was case of stuck bootstrap
that required restart to the new node and the nodes which were suppose to
stream data to it. As far as I remember this case was resolved. I haven't
seen this problem since then.

Shimi

On Tue, Jan 4, 2011 at 3:01 PM, Ran Tavory ran...@gmail.com wrote:

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago,
and I've seen the new node streaming and accepting the data from the node
which performed the anticompaction and so far it was normal so it seemed
that data is at its right place. But now the new node seems sort of stuck.
None of the other nodes is anticompacting right now or had been
anticompacting since then.
The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

The node will join the ring only when it will finish the bootstrap.

Shimi

On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

I asked the same question on the IRC but no luck there, everyone's
asleep ;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for
too long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

$ bin/nodetool -p 9004 -h localhost info
51042355038140769519506191114765231716
Load : 22.49 GB
Generation No: 1294133781
Uptime (seconds) : 1795
Heap Memory (MB) : 315.31 / 6117.00

The data ins't huge so I find it hard to believe that streaming or anti
compaction are the bottlenecks. I have ~20G on each node

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours.
It seems that all the data was already transferred to the joining host but
the joining node, after having received the data would still remain in
bootstrapping mode and not join the cluster. I'm not sure that *all* data
was transferred (perhaps other nodes need to transfer more data) but nothing
is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the
configuration file or something else?

On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago,
and I've seen the new node streaming and accepting the data from the node
which performed the anticompaction and so far it was normal so it seemed
that data is at its right place. But now the new node seems sort of stuck.
None of the other nodes is anticompacting right now or had been
anticompacting since then.
The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

In my experience most of the time it takes for a node to join the
cluster is the anticompaction on the other nodes. The streaming part is
very
fast.
Check the other nodes logs to see if there is any node doing
anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi

On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

I asked the same question on the IRC but no luck there, everyone's
asleep ;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for
too long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

$ bin/nodetool -p 9004 -h localhost info
51042355038140769519506191114765231716
Load : 22.49 GB
Generation No: 1294133781
Uptime (seconds) : 1795
Heap Memory (MB) : 315.31 / 6117.00

nodetool

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

I'm still at lost. I haven't been able to resolve this. I tried
adding another node at a different location on the ring but this node
too remains stuck in the bootstrapping state for many hours without
any of the other nodes being busy with anti compaction or anything
else. I don't know what's keeping it from finishing the bootstrap,no
CPU, no io, files were already streamed so what is it waiting for?
I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
be anything addressing a similar issue so I figured there was no point
in upgrading. But let me know if you think there is.
Or any other advice...

On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours. It seems that all the data was already transferred to the
joining host but the joining node, after having received the data would still
remain in bootstrapping mode and not join the cluster. I'm not sure that
*all* data was transferred (perhaps other nodes need to transfer more data)
but nothing is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the configuration
file or something else?

On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

In 0.6, locate the node doing anti-compaction and look in the streams
subdirectory in the keyspace data dir to monitor the anti-compaction progress
(it puts new SSTables for bootstrapping node in there)

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Running nodetool decommission didn't help. Actually the node refused to
decommission itself (b/c it wasn't part of the ring). So I simply stopped the
process, deleted all the data directories and started it again. It worked in
the sense of the node bootstrapped again but as before, after it had finished
moving the data nothing happened for a long time (I'm still waiting, but
nothing seems to be happening).

Any hints how to analyze a stuck bootstrapping node??thanks
On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
Thanks Shimi, so indeed anticompaction was run on one of the other nodes from
the same DC but to my understanding it has already ended. A few hour ago...

I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since then.

The new node's CPU is close to zero, it's iostats are almost zero so I can't
find another bottleneck that would keep it hanging.
On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

In my experience most of the time it takes for a node to join the cluster is
the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing anticompaction.I
don't remember how much data I had in the cluster when I needed to add/remove
nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.
--
/Ran

--
/Ran

Re: Bootstrapping taking long

2011-01-04 Thread Nate McCall

Does the new node have itself in the list of seeds per chance? This
could cause some issues if so.

On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
 I'm still at lost.   I haven't been able to resolve this. I tried
 adding another node at a different location on the ring but this node
 too remains stuck in the bootstrapping state for many hours without
 any of the other nodes being busy with anti compaction or anything
 else. I don't know what's keeping it from finishing the bootstrap,no
 CPU, no io, files were already streamed so what is it waiting for?
 I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
 be anything addressing a similar issue so I figured there was no point
 in upgrading. But let me know if you think there is.
 Or any other advice...

 On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
 Thanks Jake, but unfortunately the streams directory is empty so I don't 
 think that any of the nodes is anti-compacting data right now or had been in 
 the past 5 hours. It seems that all the data was already transferred to the 
 joining host but the joining node, after having received the data would 
 still remain in bootstrapping mode and not join the cluster. I'm not sure 
 that *all* data was transferred (perhaps other nodes need to transfer more 
 data) but nothing is actually happening so I assume all has been moved.
 Perhaps it's a configuration error from my part. Should I use I use 
 AutoBootstrap=true ? Anything else I should look out for in the 
 configuration file or something else?


 On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

 In 0.6, locate the node doing anti-compaction and look in the streams 
 subdirectory in the keyspace data dir to monitor the anti-compaction 
 progress (it puts new SSTables for bootstrapping node in there)


 On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:


 Running nodetool decommission didn't help. Actually the node refused to 
 decommission itself (b/c it wasn't part of the ring). So I simply stopped 
 the process, deleted all the data directories and started it again. It 
 worked in the sense of the node bootstrapped again but as before, after it 
 had finished moving the data nothing happened for a long time (I'm still 
 waiting, but nothing seems to be happening).




 Any hints how to analyze a stuck bootstrapping node??thanks
 On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
 Thanks Shimi, so indeed anticompaction was run on one of the other nodes 
 from the same DC but to my understanding it has already ended. A few hour 
 ago...



 I plenty of log messages such as [1] which ended a couple of hours ago, and 
 I've seen the new node streaming and accepting the data from the node which 
 performed the anticompaction and so far it was normal so it seemed that data 
 is at its right place. But now the new node seems sort of stuck. None of the 
 other nodes is anticompacting right now or had been anticompacting since 
 then.




 The new node's CPU is close to zero, it's iostats are almost zero so I can't 
 find another bottleneck that would keep it hanging.
 On the IRC someone suggested I'd maybe retry to join this node, 
 e.g. decommission and rejoin it again. I'll try it now...






 [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]




  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java 
 (line 338) AntiCompacting 
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]





 On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:





 In my experience most of the time it takes for a node to join the cluster is 
 the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing 
 anticompaction.I don't remember how much data I had in the cluster when I 
 needed to add/remove

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

The new node does not see itself as part of the ring, it sees all others but
itself, so from that perspective the view is consistent.
The only problem is that the node never finishes to bootstrap. It stays in
this state for hours (It's been 20 hours now...)

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote:

Does the new node have itself in the list of seeds per chance? This
could cause some issues if so.

On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
I'm still at lost. I haven't been able to resolve this. I tried
adding another node at a different location on the ring but this node
too remains stuck in the bootstrapping state for many hours without
any of the other nodes being busy with anti compaction or anything
else. I don't know what's keeping it from finishing the bootstrap,no
CPU, no io, files were already streamed so what is it waiting for?
I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
be anything addressing a similar issue so I figured there was no point
in upgrading. But let me know if you think there is.
Or any other advice...

On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours. It seems that all the data was already transferred to the
joining host but the joining node, after having received the data would
still remain in bootstrapping mode and not join the cluster. I'm not sure
that *all* data was transferred (perhaps other nodes need to transfer more
data) but nothing is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the
configuration file or something else?

On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Any hints how to analyze a stuck bootstrapping node??thanks
On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...

I plenty of log messages such as [1] which ended a couple of hours ago,
and I've seen the new node streaming and accepting the data from the node
which performed the anticompaction and so far it was normal so it seemed
that data is at its right place. But now the new node seems sort of stuck.
None of the other nodes is anticompacting right now or had been
anticompacting since then.

The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.
On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

Re: Bootstrapping taking long

18 matches

Site Navigation

Mail list logo

Footer information