Re: nodetool move seems slow

2014-06-05 Thread Jason Tyler
Hi Rob,

THX for you response and link to the issue.

The move did complete after a restart!


Cheers,

~Jason
***
From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, June 4, 2014 at 5:01 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc: Francois Richard frich...@yahoo-inc.commailto:frich...@yahoo-inc.com
Subject: Re: nodetool move seems slow

On Wed, Jun 4, 2014 at 2:34 PM, Jason Tyler 
jaty...@yahoo-inc.commailto:jaty...@yahoo-inc.com wrote:
I wrote 'apparent progress' because it reports “MOVING” and the Pending 
Commands/Responses are changing over time.  However, I haven’t seen the 
individual .db files progress go above 0%.

Your move is hung. Restart the affected nodes [1] and then restart the move.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-3486


nodetool move seems slow

2014-06-04 Thread Jason Tyler
Hello,

We have a 5-node cluster runing cassandra 1.2.16, with a significant amount of 
data:


AddressRackStatus State   LoadOwns
Token

  
6783174585269344219

10.198.xx.xx1  rack1   Up Normal  2.59 TB 60.00%  
-9223372036854775808

10.198.xx.xx2  rack1   Up Normal  1.49 TB 40.00%  
-5534023222112865485

10.198.xx.xx3  rack1   Up Normal  2.18 TB 53.23%  
-1844674407370955162

10.198.xx.xx4  rack1   Up Normal  2.86 TB 80.00%  
5534023222112865484

10.198.xx.xx5  rack1   Up Moving  2.32 TB 66.77%  
6783174585269344219



The first three nodes (.xx1 - .xx3 above) were at the desired tokens, so I 
issued a move on .xx4:

nodetool move 1844674407370955161


That was about 40hrs ago!


When I do nodetool netstats, I do see apparent progress:


jatyler@xx4:~$ nodetool netstats

Mode: MOVING

Not sending any streams.

Streaming from: /10.198.xx.xx2

   SyncCore: /var/cassandra/data/SyncCore/file-ic-31475-Data.db sections=1 
progress=0/77699597 - 0%

…

   SyncCore: /var/cassandra/data/SyncCore/anotherFile-ic-32252-Data.db 
sections=1 progress=0/1254063427 - 0%

Read Repair Statistics:

Attempted: 8047367

Mismatch (Blocking): 97327

Mismatch (Background): 74369

Pool NameActive   Pending  Completed

Commandsn/a 0  472255111

Responses   n/a 1  749751322



I wrote 'apparent progress' because it reports “MOVING” and the Pending 
Commands/Responses are changing over time.  However, I haven’t seen the 
individual .db files progress go above 0%.

Meanwhile, the system appears to have plenty of unused bandwidth, from 'iostat 
-x -m 1':


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util

sda   0.0056.00 1338.00  171.0057.59 0.8979.36 
0.570.38   0.17  25.30


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

  22.771.822.350.200.00   72.86


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util

sda   0.00 0.00  785.000.0033.80 0.0088.17 
0.270.35   0.18  14.10


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

  20.162.052.220.200.00   75.37




Is 40 hours too long for this move?  Should I be seeing individual .db files 
report more progress?  Should I start with the first box (even though the token 
appears correct)?


Any thoughts would be greatly appreciated.

THX


Cheers,

~Jason
***


Re: nodetool move seems slow

2014-06-04 Thread Robert Coli
On Wed, Jun 4, 2014 at 2:34 PM, Jason Tyler jaty...@yahoo-inc.com wrote:

  I wrote 'apparent progress' because it reports “MOVING” and the Pending
 Commands/Responses are changing over time.  However, I haven’t seen the
 individual .db files progress go above 0%.


Your move is hung. Restart the affected nodes [1] and then restart the move.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-3486