I have a six node cluster in AWS (repl:3) and recently noticed that repair
was hanging. I've run with the -pr switch.
I see this output in the nodetool command line (and also in that node's
system.log):
Starting repair command #9, repairing 256 ranges for keyspace dev_a
but then no other
if the boxes are idle, you could use jstack and look at the stackā¦ perhaps
it's locked somewhere.
Worth a shot.
On Tue, Jul 1, 2014 at 9:24 AM, Brian Tarbox tar...@cabotresearch.com
wrote:
I have a six node cluster in AWS (repl:3) and recently noticed that repair
was hanging. I've run with
On Tue, Jul 1, 2014 at 9:24 AM, Brian Tarbox tar...@cabotresearch.com
wrote:
I have a six node cluster in AWS (repl:3) and recently noticed that repair
was hanging. I've run with the -pr switch.
It'll do that.
What version of Cassandra?
=Rob
We're running 1.2.13.
Any chance that doing a rolling-restart would help?
Would running without the -pr improve the odds?
Thanks.
On Tue, Jul 1, 2014 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote:
On Tue, Jul 1, 2014 at 9:24 AM, Brian Tarbox tar...@cabotresearch.com
wrote:
I have a
Does this output from jstack indicate a problem?
ReadRepairStage:12170 daemon prio=10 tid=0x7f9dcc018800 nid=0x7361
waiting on condition [0x7f9db540c000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for
On Tue, Jul 1, 2014 at 11:09 AM, Brian Tarbox tar...@cabotresearch.com
wrote:
We're running 1.2.13.
1.2.17 contains a few streaming fixes which might help.
Any chance that doing a rolling-restart would help?
Probably not.
Would running without the -pr improve the odds?
No, that'd
Given that an upgrade is (for various internal reasons) not an option at
this point...is there anything I can do to get repair working again? I'll
also mention that I see this behavior from all nodes.
Thanks.
On Tue, Jul 1, 2014 at 2:51 PM, Robert Coli rc...@eventbrite.com wrote:
On Tue, Jul
On Tue, Jul 1, 2014 at 11:54 AM, Brian Tarbox tar...@cabotresearch.com
wrote:
Given that an upgrade is (for various internal reasons) not an option at
this point...is there anything I can do to get repair working again? I'll
also mention that I see this behavior from all nodes.
I think
For what purpose are you running repair? Because I read that we should!
:-)
We do delete data from one column family quite regularly...from the other
CFs occasionally. We almost never run with less than 100% of our nodes up.
In this configuration do we *need* to run repair?
Thanks,
On Tue,