Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either

2014-07-01 Thread Brian Tarbox
Does this output from jstack indicate a problem? ReadRepairStage:12170 daemon prio=10 tid=0x7f9dcc018800 nid=0x7361 waiting on condition [0x7f9db540c000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for

Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either

2014-07-01 Thread Robert Coli
On Tue, Jul 1, 2014 at 11:09 AM, Brian Tarbox tar...@cabotresearch.com wrote: We're running 1.2.13. 1.2.17 contains a few streaming fixes which might help. Any chance that doing a rolling-restart would help? Probably not. Would running without the -pr improve the odds? No, that'd

Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either

2014-07-01 Thread Brian Tarbox
Given that an upgrade is (for various internal reasons) not an option at this point...is there anything I can do to get repair working again? I'll also mention that I see this behavior from all nodes. Thanks. On Tue, Jul 1, 2014 at 2:51 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul

Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either

2014-07-01 Thread Robert Coli
On Tue, Jul 1, 2014 at 11:54 AM, Brian Tarbox tar...@cabotresearch.com wrote: Given that an upgrade is (for various internal reasons) not an option at this point...is there anything I can do to get repair working again? I'll also mention that I see this behavior from all nodes. I think

Re: nodetool repair saying starting and then nothing, and nothing in any of the server logs either

2014-07-01 Thread Brian Tarbox
For what purpose are you running repair? Because I read that we should! :-) We do delete data from one column family quite regularly...from the other CFs occasionally. We almost never run with less than 100% of our nodes up. In this configuration do we *need* to run repair? Thanks, On Tue,

Re: nodetool repair -snapshot option?

2014-07-01 Thread Phil Burress
Thanks! We retrieved all the ranges and started running repair on them. We ran through all of them but found one single range which brought the ENTIRE cluster down. All of the other ranges ran quickly and smoothly. This one problematic range reliably brings it down every time we try to run repair

Re: nodetool repair -snapshot option?

2014-07-01 Thread Robert Coli
On Tue, Jul 1, 2014 at 3:53 PM, Phil Burress philburress...@gmail.com wrote: Thanks! We retrieved all the ranges and started running repair on them. We ran through all of them but found one single range which brought the ENTIRE cluster down. All of the other ranges ran quickly and smoothly.

Re: nodetool repair -snapshot option?

2014-06-30 Thread Yuki Morishita
Repair uses snapshot option by default since 2.0.2 (see NEWS.txt). So you don't have to specify in your version. Do you have stacktrace when OOMed? On Mon, Jun 30, 2014 at 4:54 PM, Phil Burress philburress...@gmail.com wrote: We are running into an issue with nodetool repair. One or more of our

Re: nodetool repair -snapshot option?

2014-06-30 Thread Kevin Burton
The stack won't help a ton since the memory leak will occur elsewhere… the stack will just have the point where the memory allocation failed :-( On Mon, Jun 30, 2014 at 3:08 PM, Yuki Morishita mor.y...@gmail.com wrote: Repair uses snapshot option by default since 2.0.2 (see NEWS.txt). So you

Re: nodetool repair -snapshot option?

2014-06-30 Thread Robert Coli
On Mon, Jun 30, 2014 at 3:08 PM, Yuki Morishita mor.y...@gmail.com wrote: Repair uses snapshot option by default since 2.0.2 (see NEWS.txt). As a general meta comment, the process by which operationally important defaults change in Cassandra seems ad-hoc and sub-optimal. For to record, my

Re: nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
We are running repair -pr. We've tried subrange manually and that seems to work ok. I guess we'll go with that going forward. Thanks for all the info! On Mon, Jun 30, 2014 at 6:52 PM, Jaydeep Chovatia chovatia.jayd...@gmail.com wrote: Are you running full repair or on subset? If you are

Re: nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
One last question. Any tips on scripting a subrange repair? On Mon, Jun 30, 2014 at 7:12 PM, Phil Burress philburress...@gmail.com wrote: We are running repair -pr. We've tried subrange manually and that seems to work ok. I guess we'll go with that going forward. Thanks for all the info!

Re: nodetool repair -snapshot option?

2014-06-30 Thread Paulo Ricardo Motta Gomes
If you find it useful, I created a tool where you input the node IP, keyspace, column family, and optionally the number of partitions (default: 32K), and it outputs the list of subranges for that node, CF, partition size: https://github.com/pauloricardomg/cassandra-list-subranges So you can

Re: nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
@Paulo, this is very cool! Thanks very much for the link! On Mon, Jun 30, 2014 at 9:37 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: If you find it useful, I created a tool where you input the node IP, keyspace, column family, and optionally the number of partitions

Re: nodetool repair loops version 2.0.6

2014-04-09 Thread Kevin McLaughlin
In fact, it did eventually finish in ~20 minutes. Is this duration expected/normal? --Kevin On Wed, Apr 9, 2014 at 9:32 AM, Kevin McLaughlin kmcla...@gmail.com wrote: Have a test cluster with three nodes each in two datacenters. The following causes nodetool repair to go into an (apparent)

Re: nodetool repair loops version 2.0.6

2014-04-09 Thread Robert Coli
On Wed, Apr 9, 2014 at 7:09 AM, Kevin McLaughlin kmcla...@gmail.com wrote: In fact, it did eventually finish in ~20 minutes. Is this duration expected/normal? https://issues.apache.org/jira/browse/CASSANDRA-5220 =Rob

Re: nodetool repair stalled

2014-01-14 Thread Paolo Crosato
I was able to complete the repair, repairing one keyspace and cf each time. However the last session is still shown as an active process, even if the session has been successfully completed, this is the log: INFO [CompactionExecutor:252] 2014-01-14 03:10:13,105 CompactionTask.java (line 275)

Re: nodetool repair stalled

2014-01-08 Thread Robert Coli
On Wed, Jan 8, 2014 at 8:52 AM, Paolo Crosato paolo.cros...@targaubiest.com wrote: I have two nodes with Cassandra 2.0.3, where repair sessions hang for an undefinite time. I'm running nodetool repair once a week on every node, on different days. Currently I have like 4 repair sessions

Re: nodetool repair stalled

2014-01-08 Thread sankalp kohli
Hi, Can you attach the logs around repair. Please do that for node which triggered it and nodes involved in repair. I will try to find something useful. Thanks, Sankalp On Wed, Jan 8, 2014 at 10:18 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jan 8, 2014 at 8:52 AM, Paolo Crosato

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-12 Thread David Laube
Thank you for the reply Aaron. Unfortunately, I could not seem to find any additional info in the logs. However, upgrading from 2.0.2 to 2.0.3 seems to have done the trick! Best regards, -David Laube On Dec 11, 2013, at 6:51 PM, Aaron Morton aa...@thelastpickle.com wrote: [2013-12-08

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Rahul Menon
Sven So basically when you run a repair you are essentially telling your cluster to run a validation compaction, which generates a merkle tree on all the nodes. These trees are used to identify the inconsistencies. So there is quite a bit of streaming which you see as your network traffic. Rahul

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Sven Stark
Hi Rahul, thanks for replying. Could you please be a bit more specific, though. Eg what exactly is being compacted - there is/was no data at all in the cluster save for a few hundred kB in the system CF (see the nodetool status output). Or - how can those few hundred kB in data generate Gb of

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Robert Coli
On Wed, Dec 11, 2013 at 1:35 AM, Sven Stark sven.st...@m-square.com.auwrote: thanks for replying. Could you please be a bit more specific, though. Eg what exactly is being compacted - there is/was no data at all in the cluster save for a few hundred kB in the system CF (see the nodetool status

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-11 Thread Aaron Morton
[2013-12-08 11:04:02,047] Repair session ff16c510-5ff7-11e3-97c0-5973cc397f8f for range (1246984843639507027,1266616572749926276] failed with error org.apache.cassandra.exceptions.RepairException: [repair #ff16c510-5ff7-11e3-97c0-5973cc397f8f on keyspace_name/col_family1,

Re: nodetool repair keeping an empty cluster busy

2013-12-10 Thread Sven Stark
Corollary: what is getting shipped over the wire? The ganglia screenshot shows the network traffic on all the three hosts on which I ran the nodetool repair. [image: Inline image 1] remember UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-09 Thread Laing, Michael
My experience is that you must upgrade to 2.0.3 ASAP to fix this. Michael On Mon, Dec 9, 2013 at 6:39 PM, David Laube d...@stormpath.com wrote: Hi All, We are running Cassandra 2.0.2 and have recently stumbled upon an issue with nodetool repair. Upon running nodetool repair on each of the

Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-26 Thread Christopher J. Bottaro
We only have a single CF per keyspace. Actually we have 2, but one is tiny (only has 2 rows in it and is queried once a month or less). Yup, using vnodes with 256 tokens. Cassandra 1.2.10. -- C On Mon, Nov 25, 2013 at 2:28 PM, John Pyeatt john.pye...@singlewire.comwrote: Mr. Bottaro,

Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread Christopher J. Bottaro
We have the same setup: one keyspace per client, and currently about 300 keyspaces. nodetool repair takes a long time, 4 hours with -pr on a single node. We have a 4 node cluster with about 10 gb per node. Unfortunately, we haven't been keeping track of the running time as keyspaces, or load,

Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread John Pyeatt
Mr. Bottaro, About how many column families are in your keyspaces? We have 28 per keyspace. Are you using Vnodes? We are and they are set to 256 What version of cassandra are you running. We are running 1.2.9 On Mon, Nov 25, 2013 at 11:36 AM, Christopher J. Bottaro

Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread Robert Coli
On Mon, Nov 25, 2013 at 12:28 PM, John Pyeatt john.pye...@singlewire.comwrote: Are you using Vnodes? We are and they are set to 256 What version of cassandra are you running. We are running 1.2.9 Vnode performance vis a vis repair is this JIRA issue :

Re: nodetool repair hung?

2013-03-27 Thread aaron morton
...@yahoo.com Subject: Re: nodetool repair hung? To: user@cassandra.apache.org check nodetool tpstats and looking for AntiEntropySessions/AntiEntropyStages grep the log and looking for repair and merkle tree - Original Message - From: S C as...@outlook.com To: user

Re: nodetool repair hung?

2013-03-25 Thread Wei Zhu
check nodetool tpstats and looking for AntiEntropySessions/AntiEntropyStages grep the log and looking for repair and merkle tree - Original Message - From: S C as...@outlook.com To: user@cassandra.apache.org Sent: Monday, March 25, 2013 2:55:30 PM Subject: nodetool repair hung? I am

RE: nodetool repair hung?

2013-03-25 Thread S C
Thank you. It helped me. Date: Mon, 25 Mar 2013 15:22:32 -0700 From: wz1...@yahoo.com Subject: Re: nodetool repair hung? To: user@cassandra.apache.org check nodetool tpstats and looking for AntiEntropySessions/AntiEntropyStages grep the log and looking for repair and merkle tree

Re: nodetool repair with vnodes

2013-02-18 Thread aaron morton
So, running it periodically on just one node is enough for cluster maintenance ? In the special case where you have RF == Number of nodes. The recommended approach is to use -pr and run it on each node periodically. Also: running it with -pr does output: That does not look right. There

Re: [nodetool] repair with vNodes

2013-02-17 Thread aaron morton
I'm a bit late, but for reference. Repair runs in two stages, first differences are detected. You an monitor the validation compaction with nodetool compactionstats. Then the differences are streamed between the nodes, you can monitor that with nodetool netstats. Nodetool repair command

Re: nodetool repair with vnodes

2013-02-17 Thread aaron morton
…so it seems to me that it is running on all vnodes ranges. Yes. Also, whatever the node which I launch the command on is, only one node log is moving and is always the same node. Not sure what you mean here. So, to me, it's like the nodetool repair command is running always on the same

Re: nodetool repair with vnodes

2013-02-17 Thread Marco Matarazzo
So, to me, it's like the nodetool repair command is running always on the same single node and repairing everything. If you use nodetool repair without the -pr flag in your setup (3 nodes and I assume RF 3) it will repair all token ranges in the cluster. That's correct, 3 nodes and RF 3.

Re: Nodetool repair, exit code/status?

2012-10-09 Thread Edward Sargisson
This is a problem for us as well. Our current planned approach is to parse the logs for repair errors. Having nodetool repair return an exit code for some of this failures would be *very* useful. Cheers, Edward On 12-10-08 06:49 PM, David Daeschler wrote: Hello. In the process of trying to

Re: Nodetool repair and Leveled Compaction

2012-09-26 Thread Omid Aladini
I think this JIRA answers your question: https://issues.apache.org/jira/browse/CASSANDRA-2610 which in order not to duplicate work (creation of Merkle trees) repair is done on all replicas for a range. Cheers, Omid On Tue, Sep 25, 2012 at 8:27 AM, Sergey Tryuber stryu...@gmail.com wrote: Hi

Re: Nodetool repair and Leveled Compaction

2012-09-25 Thread Sergey Tryuber
Hi Radim Unfortunately number of compaction tasks is not overestimated. The number is decremented one-by-one and this process takes several hours for our 40GB node(( Also, when a lot of compaction tasks appears, we see that total disk space used (via JMX) is doubled and Cassandra really tries to

Re: Nodetool repair and Leveled Compaction

2012-09-24 Thread Radim Kolar
Repair process by itself is going well in a background, but the issue I'm concerned is a lot of unnecessary compaction tasks number in compaction tasks counter is over estimated. For example i have 1100 tasks left and if I will stop inserting data, all tasks will finish within 30 minutes. I

Re: nodetool repair - when is it not needed ?

2012-08-23 Thread aaron morton
HH works to a point. Specifically, it only collects hints for the first hour the node is down and it has a safety valve to avoid the node collecting hints getting overwhelmed. Looking at the code it takes a bit for that the trip and you would get a TimeoutException coming back. Also when

Re: nodetool repair - when is it not needed ?

2012-08-23 Thread aaron morton
Also when hints are replayed they are sent of as mutations, which may still be dropped by the target if they are not serviced before rpc_timeout. Sending nodes throttle their requests so it's unlikely but possible. My bad there. I thought the mutations were send one way. When node is

Re: nodetool repair - when is it not needed ?

2012-08-22 Thread Rob Coli
On Wed, Aug 22, 2012 at 8:37 AM, Senthilvel Rangaswamy senthil...@gmail.com wrote: We are running Cassandra 1.1.2 on EC2. Our database is primarily all counters and we don't do any deletes. Does nodetool repair do anything for such a database. All the docs I read for nodetool repair suggests

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread aaron morton
I would take a look at the replication: whats the RF per DC and what does nodetool ring say. It's hard (as in no recommended) to get NTS with rack allocation working correctly. Without know much more I would try to understand what the topology is and if it can be simplified. Additionally,

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread Jim Cistaro
...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Fri, 17 Aug 2012 20:40:54 +1200 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: nodetool repair uses insane amount of disk space I would take a look

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread Peter Schuller
How come a node would consume 5x its normal data size during the repair process? https://issues.apache.org/jira/browse/CASSANDRA-2699 It's likely a variation based on how out of synch you happen to be, and whether you have a neighbor that's also been repaired and bloated up already. My setup

Re: nodetool repair uses insane amount of disk space

2012-08-16 Thread aaron morton
What version are using ? There were issues with repair using lots-o-space in 0.8.X, it's fixed in 1.X Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 2:56 AM, Michael Morris michael.m.mor...@gmail.com wrote: Occasionally

Re: nodetool repair uses insane amount of disk space

2012-08-16 Thread Michael Morris
Upgraded to 1.1.3 from 1.0.8 about 2 weeks ago. On Thu, Aug 16, 2012 at 5:57 PM, aaron morton aa...@thelastpickle.comwrote: What version are using ? There were issues with repair using lots-o-space in 0.8.X, it's fixed in 1.X Cheers - Aaron Morton Freelance Developer

Re: nodetool repair

2012-07-15 Thread Michael Theroux
So, if I have a 6 node cluster in the token ring, A-B-C-D-E-F, replication factor 3, and I run repair (without -pr) on A, is the flow of information: A synchronizes information it is responsible for with B and C (because B and C are replicas of A). A, as a replica of E and F, synchronizes E and

Re: nodetool repair -- should I schedule a weekly one ?

2012-06-07 Thread ruslan usifov
Yes, for ONE you cant got inconsistent read in case when one of you nodes are die, and dinamyc snitch doesn't do it job 2012/6/7 Oleg Dulin oleg.du...@gmail.com: We have a 3-node cluster. We use RF of 3 and CL of ONE for both reads and writes…. Is there a reason I should schedule a regular

Re: nodetool repair -- should I schedule a weekly one ?

2012-06-07 Thread ruslan usifov
Sorry no dinamic snitch, but hinted handoff. Remember casaandra is evently consistent 2012/6/8 ruslan usifov ruslan.usi...@gmail.com: Yes, for ONE you cant got inconsistent read in case when one of you nodes are die, and dinamyc snitch doesn't do it job 2012/6/7 Oleg Dulin

RE: nodetool repair -pr enough in this scenario?

2012-06-05 Thread Viktor Jevdokimov
Understand simple mechanics first, decide how to act later. Without -PR there's no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. With -PR it runs only for a primary range of a node you are running a repair.

Re: nodetool repair -pr enough in this scenario?

2012-06-05 Thread R. Verlangen
In your case -pr would be just fine (see Viktor's explanation). 2012/6/5 Viktor Jevdokimov viktor.jevdoki...@adform.com Understand simple mechanics first, decide how to act later. ** ** Without –PR there’s no difference from which host to run repair, it runs for the whole 100% range,

Re: nodetool repair -pr enough in this scenario?

2012-06-05 Thread Sylvain Lebresne
On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Understand simple mechanics first, decide how to act later. ** ** Without –PR there’s no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole

RE: nodetool repair -pr enough in this scenario?

2012-06-05 Thread Viktor Jevdokimov
@cassandra.apache.org Subject: Re: nodetool repair -pr enough in this scenario? On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote: Understand simple mechanics first, decide how to act later. Without -PR there's no difference from which host

Re: nodetool repair -pr enough in this scenario?

2012-06-05 Thread aaron morton
and irrevocably delete this message and any copies. From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Tuesday, June 05, 2012 11:02 To: user@cassandra.apache.org Subject: Re: nodetool repair -pr enough in this scenario? On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov

Re: nodetool repair -pr enough in this scenario?

2012-06-05 Thread David Daeschler
Thank you for all the replies. It has been enlightening to read. I think I now have a better idea of repair, ranges, replicas and how the data is distributed. It also seems that using -pr would be the best way to go in my scenario with 1.x+ Thank you for all the feedback. Glad to see such an

Re: nodetool repair taking forever

2012-05-25 Thread Raj N
Thanks for the reply Aaron. By compaction being on, do you mean if run nodetool compact, then the answer is no. I haven't set any explicit compaction_thresholds which means it should be using the default, min 4 and max 32. Having said that to solve the problem, I just did a full cluster restart

Re: nodetool repair taking forever

2012-05-25 Thread Rob Coli
On Sat, May 19, 2012 at 8:14 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, [ repair seems to be hanging forever ] https://issues.apache.org/jira/browse/CASSANDRA-2433 Affects 0.8.4. I also believe there is a contemporaneous bug (reported by Stu Hood?) regarding failed repair resulting

Re: nodetool repair taking forever

2012-05-22 Thread aaron morton
I also dont understand if all these nodes are replicas of each other why is that the first node has almost double the data. Have you performed any token moves ? Old data is not deleted unless you run nodetool cleanup. Another possibility is things like a lot of hints. Admittedly it would have

Re: nodetool repair requirement

2012-05-14 Thread aaron morton
Personally I would. Repair is *the* was to ensure data is fully distributed. Hinted Hand Off and Read Repair are considered optimisations designed to reduce the chance of an inconsistency during a read. Cheers - Aaron Morton Freelance Developer @aaronmorton

Re: nodetool repair requirement

2012-05-13 Thread Kamal Bahadur
As per the documentation, you don't have to if you don't delete or update. On Sun, May 13, 2012 at 9:18 AM, Thanh Ha javaby...@gmail.com wrote: Hi All, Do I have to do maintenance nodetool repair on CFs that do not have deletions? I only perform deletes on two column families in my

Re: nodetool repair requirement

2012-05-13 Thread Thanh Ha
Thanks Kamal On Sun, May 13, 2012 at 9:30 AM, Kamal Bahadur mailtoka...@gmail.com wrote: As per the documentation, you don't have to if you don't delete or update. On Sun, May 13, 2012 at 9:18 AM, Thanh Ha javaby...@gmail.com wrote: Hi All, Do I have to do maintenance nodetool repair on

Re: nodetool repair requirement

2012-05-13 Thread Igor
On 05/13/2012 07:18 PM, Thanh Ha wrote: Hi All, Do I have to do maintenance nodetool repair on CFs that do not have deletions? Probably you should (depending how you do reads), if your nodes for some reasons have different data (like connectivity problems, node down, etc). I only perform

Re: nodetool repair cassandra 0.8.4 HELP!!!

2012-04-29 Thread Watanabe Maki
You should run repair. If the disk space is the problem, try to cleanup and major compact before repair. You can limit the streaming data by running repair for each column family separately. maki On 2012/04/28, at 23:47, Raj N raj.cassan...@gmail.com wrote: I have a 6 node cassandra cluster

Re: nodetool repair cassandra 0.8.4 HELP!!!

2012-04-29 Thread Raj N
I tried it on 1 column family. I believe there is a bug in 0.8* where repair ignores the cf. I tried this multiple times on different nodes. Every time the disk util was going uo to 80% on a 500 GB disk. I would eventually kill the repair. I only have 60GB worth data. I see this JIRA -

Re: nodetool repair cassandra 0.8.4 HELP!!!

2012-04-29 Thread aaron morton
When you start a node does it log that it's opening SSTables ? After starting what does nodetool cfstats say for the node ? Can you connect with cassandra-cli and do a get ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/04/2012, at

Re: nodetool repair hanging

2012-04-26 Thread Bill Au
My cluster is very small (300 MB) and compact was taking more than 2 hours. I ended up bouncing all the nodes. After that, I was able to run repair on all nodes, and each one takes less than a minute. If this happens again I will be sure to run compactionstats and netstats. Thanks for that

Re: nodetool repair hanging

2012-04-25 Thread Gregg Ulrich
How much data do you have and how long is a while? In my experience repairs can take a very long time. Check to see if validation compactions are running (nodetool compactionstats) or if files are streaming (nodetool netstats). If either of those are in progress then your repair should be

Re: nodetool repair does not return...

2011-08-25 Thread Boris Yen
We tried to dump the stack trace of threads, we noticed that manual-repair-d08349af-189f-47cb-9cc3-452538ce04d1 daemon prio=10 tid=0x406a3000 nid=0x1890 waiting on condition [0x7f5c97be8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

Re: nodetool repair does not return...

2011-08-25 Thread aaron morton
That's a thread waiting for other threads / activities to complete. Nothing unusual there. Work out how fair the repair gets. Is there a validation compaction listed in nodetool compactionstats ? Are there any streams running in nodetool netstats ? Look through the logs on the machine you

Re: nodetool repair does not return...

2011-08-24 Thread Boris Yen
Would Cassandra-2433 cause this? On Wed, Aug 24, 2011 at 7:23 PM, Boris Yen yulin...@gmail.com wrote: Hi, In our testing environment, we got two nodes with RF=2 running 0.8.4. We tried to test the repair functions of cassandra, however, every once a while, the nodetool repair never returns.

Re: nodetool repair caused high disk space usage

2011-08-23 Thread Héctor Izquierdo Seliva
El sáb, 20-08-2011 a las 01:22 +0200, Peter Schuller escribió: Is there any chance that the entire file from source node got streamed to destination node even though only small amount of data in hte file from source node is supposed to be streamed destination node? Yes, but the thing

Re: nodetool repair caused high disk space usage

2011-08-22 Thread Huy Le
After having done so many tries, I am not sure which log entries correspond to what. However, there were many of this type: WARN [CompactionExecutor:14] 2011-08-18 18:47:00,596 CompactionManager.java (line 730) Index file contained a different key or row size; using key from data file And

Re: nodetool repair caused high disk space usage

2011-08-21 Thread Philippe
Do you have an indication that at least the disk space is in fact consistent with the amount of data being streamed between the nodes? I think you had 90 - ~ 450 gig with RF=3, right? Still sounds like a lot assuming repairs are not running concurrently (and compactions are able to run after

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
Péter, In our case they get created exclusively during repairs. Compactionstats showed a huge number of sstable build compactions On Aug 20, 2011 1:23 AM, Peter Schuller peter.schul...@infidyne.com wrote: Is there any chance that the entire file from source node got streamed to destination node

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Peter Schuller
In our case they get created exclusively during  repairs. Compactionstats showed a huge number of sstable build compactions Do you have an indication that at least the disk space is in fact consistent with the amount of data being streamed between the nodes? I think you had 90 - ~ 450 gig with

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller
The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.) It does. -- / Peter Schuller (@scode on twitter)

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller
Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this? It does not seem entirely crazy, depending on the nature of your data and how CPU-intensive it is per byte to compact. Assuming

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
After upgrading to cass 0.8.4 from cass 0.6.11.  I ran scrub.  That worked fine.  Then I ran nodetool repair on one of the nodes.  The disk usage on data directory increased from 40GB to 480GB, and it's still growing. If you check your data directory, does it contain a lot of *Compacted files?

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
There were few Compacted files. I thought that might have been the cause, but it wasn't it. We have a CF that is 23GB, and while repair is running, there are multiple instances of that CF created along with other CFs. I checked the stream directory across cluster of four nodes, but it was

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
There were few Compacted files.  I thought that might have been the cause, but it wasn't it.  We have a CF that is 23GB, and while repair is running, there are multiple instances of that CF created along with other CFs. To confirm - are you saying the data directory size is huge, but the live

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
To confirm - are you saying the data directory size is huge, but the live size as reported by nodetool ring and nodetool info does NOT reflect this inflated size? That's correct. What files *do* you have in the data directory? Any left-over *tmp* files for example? The files that

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
Is there any chance that the entire file from source node got streamed to destination node even though only small amount of data in hte file from source node is supposed to be streamed destination node? Yes, but the thing that's annoying me is that even if so - you should not be seeing a 40 gb

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Huy Le
Philippe, Besides the system keyspace, we have only one user keyspace. However, tell me that we can also try repairing one CF at a time. We have two concurrent compactors configured. Will change that to one. Huy On Wed, Aug 17, 2011 at 6:10 PM, Philippe watche...@gmail.com wrote: Huy,

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Philippe
Unfortunately repairing one cf at a time didn't help in my case because it still streams all CF and that triggers lots of compactions On Aug 18, 2011 3:48 PM, Huy Le hu...@springpartners.com wrote:

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Huy Le
Thanks. I won't try that then. So in our environment, after upgrading from 0.6.11 to 0.8.4, we have to run scrub on all nodes before we can run repair on them. Is there any chance that running scrub on the nodes causing data from all SSTables being streamed to/from other nodes on running

Re: nodetool repair caused high disk space usage

2011-08-18 Thread aaron morton
No scrub is a local operation only. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 19/08/2011, at 6:36 AM, Huy Le wrote: Thanks. I won't try that then. So in our environment, after upgrading from 0.6.11 to 0.8.4, we have

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-18 Thread aaron morton
The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.) Watch the logs or check nodetool compactionstats to see when the Validation completes completes. and nodetool netstats to see how long the data transfer takes It sounds a

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Philippe
Look at my last two or three threads. I've encountered the same thing and got some pointers/answers. On Aug 17, 2011 4:03 PM, Huy Le hu...@springpartners.com wrote: Hi, After upgrading to cass 0.8.4 from cass 0.6.11. I ran scrub. That worked fine. Then I ran nodetool repair on one of the

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Huy Le
I restarted the cluster and kicked off repair on the same node again. It only made the matter worse. It filled up the 830GB partition, and cassandra on the node repair ran on crashed. I restarted it, and now I am running compaction to reduce disk usage. Repair after upgrading to 0.8.4 is still

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Philippe
Huy, Have you tried repairing one keyspace at a time and then giving it some breathing time to compact. My current observations is that the streams of repairs are triggering massive compactions which are filling up my disks too. Another idea I'd like to try is to limit the number of concurrent

Re: nodetool repair: No neighbors

2011-07-31 Thread Sylvain Lebresne
On Sun, Jul 31, 2011 at 2:25 AM, Jason Baker ja...@apture.com wrote: When I run nodetool repair on a node on my 3-node cluster, I see 3 messages like the following:  INFO [manual-repair-6d9a617f-c496-4744-9002-a56909b83d5b] 2011-07-30 18:50:28,464 AntiEntropyService.java (line 636) No

Re: nodetool repair: No neighbors

2011-07-31 Thread Norman Maurer
I created an issue and attached a patch: https://issues.apache.org/jira/browse/CASSANDRA-2979 I was not sure if it would be better to handle it in NodeProbe or StorageService.. Bye, Norman 2011/7/31 Sylvain Lebresne sylv...@datastax.com: On Sun, Jul 31, 2011 at 2:25 AM, Jason Baker

Re: nodetool repair: No neighbors

2011-07-30 Thread Jonathan Ellis
I would guess that means you've only configured a single replica per row. On Sat, Jul 30, 2011 at 7:25 PM, Jason Baker ja...@apture.com wrote: When I run nodetool repair on a node on my 3-node cluster, I see 3 messages like the following:  INFO

Re: nodetool repair mykeyspace mycolumnfamily repairs all the keyspace

2011-07-19 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-2280 2011/7/19 Héctor Izquierdo Seliva izquie...@strands.com: Hi all, Maybe I'm doing something wrong, but calling ./nodetool -h host repair mykeyspace mycolumnfamily should only repair mycolumnfamily right? Everytime I try a repair it repairs

Re: nodetool repair mykeyspace mycolumnfamily repairs all the keyspace

2011-07-19 Thread Héctor Izquierdo Seliva
Are there any plans to backport this to 0.8? El mar, 19-07-2011 a las 11:43 -0500, Jonathan Ellis escribió: https://issues.apache.org/jira/browse/CASSANDRA-2280 2011/7/19 Héctor Izquierdo Seliva izquie...@strands.com: Hi all, Maybe I'm doing something wrong, but calling ./nodetool -h

Re: nodetool repair mykeyspace mycolumnfamily repairs all the keyspace

2011-07-19 Thread Jonathan Ellis
Short answer: no. Long answer: https://issues.apache.org/jira/browse/CASSANDRA-2818 2011/7/19 Héctor Izquierdo Seliva izquie...@strands.com: Are there any plans to backport this to 0.8? El mar, 19-07-2011 a las 11:43 -0500, Jonathan Ellis escribió:

Re: nodetool repair question

2011-07-05 Thread Edward Capriolo
On Tue, Jul 5, 2011 at 1:27 PM, Raj N raj.cassan...@gmail.com wrote: Hi experts, Are there any benchmarks that quantify how long nodetool repair takes? Something which says on this kind of hardware, with this much of data, nodetool repair takes this long. The other question that I have

<    1   2   3   >