In 0.6, locate the node doing anti-compaction and look in the "streams" subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there)
On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <ran...@gmail.com> wrote: > Running nodetool decommission didn't help. Actually the node refused to > decommission itself (b/c it wasn't part of the ring). So I simply stopped > the process, deleted all the data directories and started it again. It > worked in the sense of the node bootstrapped again but as before, after it > had finished moving the data nothing happened for a long time (I'm still > waiting, but nothing seems to be happening). > > Any hints how to analyze a "stuck" bootstrapping node?? > thanks > > On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <ran...@gmail.com> wrote: > >> Thanks Shimi, so indeed anticompaction was run on one of the other nodes >> from the same DC but to my understanding it has already ended. A few hour >> ago... >> I plenty of log messages such as [1] which ended a couple of hours ago, >> and I've seen the new node streaming and accepting the data from the node >> which performed the anticompaction and so far it was normal so it seemed >> that data is at its right place. But now the new node seems sort of stuck. >> None of the other nodes is anticompacting right now or had been >> anticompacting since then. >> The new node's CPU is close to zero, it's iostats are almost zero so I >> can't find another bottleneck that would keep it hanging. >> >> On the IRC someone suggested I'd maybe retry to join this node, >> e.g. decommission and rejoin it again. I'll try it now... >> >> >> [1] >> INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java >> (line 338) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java >> (line 338) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java >> (line 338) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java >> (line 338) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com> wrote: >> >>> In my experience most of the time it takes for a node to join the cluster >>> is the anticompaction on the other nodes. The streaming part is very fast. >>> Check the other nodes logs to see if there is any node doing >>> anticompaction. >>> I don't remember how much data I had in the cluster when I needed to >>> add/remove nodes. I do remember that it took a few hours. >>> >>> The node will join the ring only when it will finish the bootstrap. >>> >>> Shimi >>> >>> >>> On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory <ran...@gmail.com> wrote: >>> >>>> I asked the same question on the IRC but no luck there, everyone's >>>> asleep ;)... >>>> >>>> Using 0.6.6 I'm adding a new node to the cluster. >>>> It starts out fine but then gets stuck on the bootstrapping state for >>>> too long. More than an hour and still counting. >>>> >>>> $ bin/nodetool -p 9004 -h localhost streams >>>>> Mode: Bootstrapping >>>>> Not sending any streams. >>>>> Not receiving any streams. >>>> >>>> >>>> It seemed to have streamed data from other nodes and indeed the load is >>>> non-zero but I'm not clear what's keeping it right now from finishing. >>>> >>>>> $ bin/nodetool -p 9004 -h localhost info >>>>> 51042355038140769519506191114765231716 >>>>> Load : 22.49 GB >>>>> Generation No : 1294133781 >>>>> Uptime (seconds) : 1795 >>>>> Heap Memory (MB) : 315.31 / 6117.00 >>>> >>>> >>>> nodetool ring does not list this new node in the ring, although nodetool >>>> can happily talk to the new node, it's just not listing itself as a member >>>> of the ring. This is expected when the node is still bootstrapping, so the >>>> question is still how long might the bootstrap take and whether is it >>>> stuck. >>>> >>>> The data ins't huge so I find it hard to believe that streaming or anti >>>> compaction are the bottlenecks. I have ~20G on each node and the new node >>>> already has just about that so it seems that all data had already been >>>> streamed to it successfully, or at least most of the data... So what is it >>>> waiting for now? (same question, rephrased... ;) >>>> >>>> I tried: >>>> 1. Restarting the new node. No good. All logs seem normal but at the end >>>> the node is still in bootstrap mode. >>>> 2. As someone suggested I increased the rpc timeout from 10k to 30k >>>> (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the >>>> new node. Should I have done that on all (old) nodes as well? Or maybe only >>>> on the ones that were supposed to stream data to that node. >>>> 3. Logging level at DEBUG now but nothing interesting going on except >>>> for occasional messages such as [1] or [2] >>>> >>>> So the question is: what's keeping the new node from finishing the >>>> bootstrap and how can I check its status? >>>> Thanks >>>> >>>> [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line >>>> 36) Disseminating load info ... >>>> [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 >>>> 05:12:48,033 StorageService.java (line 1189) computing ranges for >>>> 28356863910078205288614550619314017621, >>>> 56713727820156410577229101238628035242, >>>> 85070591730234615865843651857942052863, >>>> 113427455640312821154458202477256070484, >>>> 141784319550391026443072753096570088105, >>>> 170141183460469231731687303715884105727 >>>> >>>> -- >>>> /Ran >>>> >>>> >>> >> >> >> -- >> /Ran >> >> > > > -- > /Ran > >