Re: quietness of full nodetool repair on large dataset

2017-09-28 Thread Jeff Jirsa
Screen and/or subrange repair (e.g. reaper)

-- 
Jeff Jirsa


> On Sep 28, 2017, at 8:23 PM, Mitch Gitman  wrote:
> 
> I'm on Apache Cassandra 3.10. I'm interested in moving over to Reaper for 
> repairs, but in the meantime, I want to get nodetool repair working a little 
> more gracefully. 
> 
> What I'm noticing is that, when I'm running a repair for the first time with 
> the --full option after a large initial load of data, the client will say 
> it's starting on a repair job and then cease to produce any output for not 
> just minutes but a few hours. This causes SSH inactivity timeouts. I have 
> tried running the repair with the --trace option, but then that leads to the 
> other extreme where there's just a torrent of output, scarcely any of which 
> I'll typically need. 
> 
> As a literal solution to my SSH inactivity timeouts, I could extend the 
> timeouts, or I could do some scripting jujitsu with StrictHostKeyChecking=no 
> and a loop that spits some arbitrary output until the command finishes. But 
> even if the timeouts were no concern, the sheer unresponsiveness is apt to 
> make an operator nervous. And I'd like to think there's a Goldilocks way to 
> run a full nodetool repair on a large dataset where it's just a bit more 
> responsive without going all TMI. Thoughts? Anyone else notice this?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



quietness of full nodetool repair on large dataset

2017-09-28 Thread Mitch Gitman
I'm on Apache Cassandra 3.10. I'm interested in moving over to Reaper for
repairs, but in the meantime, I want to get nodetool repair working a
little more gracefully.

What I'm noticing is that, when I'm running a repair for the first time
with the --full option after a large initial load of data, the client will
say it's starting on a repair job and then cease to produce any output for
not just minutes but a few hours. This causes SSH inactivity timeouts. I
have tried running the repair with the --trace option, but then that leads
to the other extreme where there's just a torrent of output, scarcely any
of which I'll typically need.

As a literal solution to my SSH inactivity timeouts, I could extend the
timeouts, or I could do some scripting jujitsu with
StrictHostKeyChecking=no and a loop that spits some arbitrary output until
the command finishes. But even if the timeouts were no concern, the sheer
unresponsiveness is apt to make an operator nervous. And I'd like to think
there's a Goldilocks way to run a full nodetool repair on a large dataset
where it's just a bit more responsive without going all TMI. Thoughts?
Anyone else notice this?