Re: quietness of full nodetool repair on large dataset

2017-09-28 Thread Jeff Jirsa
Screen and/or subrange repair (e.g. reaper) -- Jeff Jirsa > On Sep 28, 2017, at 8:23 PM, Mitch Gitman wrote: > > I'm on Apache Cassandra 3.10. I'm interested in moving over to Reaper for > repairs, but in the meantime, I want to get nodetool repair working a little >

quietness of full nodetool repair on large dataset

2017-09-28 Thread Mitch Gitman
I'm on Apache Cassandra 3.10. I'm interested in moving over to Reaper for repairs, but in the meantime, I want to get nodetool repair working a little more gracefully. What I'm noticing is that, when I'm running a repair for the first time with the --full option after a large initial load of

?????? data loss in different DC

2017-09-28 Thread Peng Xiao
Thanks All -- -- ??: "Jeff Jirsa";; : 2017??9??28??(??) 9:16 ??: "user"; : Re: data loss in different DC Your quorum writers are only guaranteed to be on half+1 nodes - there??s

Re:

2017-09-28 Thread Jeff Jirsa
The digest mismatch exception is not a problem, that's why it's only logged at debug. As Thomas noted, there's a pretty good chance this is https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot of GCInspector logs indicating GC pauses, that would add confidence to that

Re:

2017-09-28 Thread Dan Kinder
Sorry, for that ReadStage exception, I take it back, accidentally ended up too early in the logs. This node that has building ReadStage shows no exceptions in the logs. nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage

Re:

2017-09-28 Thread Dan Kinder
Thanks for the responses. @Prem yes this is after the entire cluster is on 3.11, but no I did not run upgradesstables yet. @Thomas no I don't see any major GC going on. @Jeff yeah it's fully upgraded. I decided to shut the whole thing down and bring it back (thankfully this cluster is not

RE:

2017-09-28 Thread Steinmaurer, Thomas
Dan, do you see any major GC? We have been hit by the following memory leak in our loadtest environment with 3.11.0. https://issues.apache.org/jira/browse/CASSANDRA-13754 So, depending on the heap size and uptime, you might get into heap troubles. Thomas From: Dan Kinder

Re:

2017-09-28 Thread Jeff Jirsa
That read timeout looks like a nonissue (though we should catch it and squash it differently). MigrationStage is backed up as well. Are you still bouncing nodes? Have you fully upgraded the cluster at this point? On Thu, Sep 28, 2017 at 9:44 AM, Dan Kinder wrote: > I

2.1.19 and 2.2.11 releases up for vote

2017-09-28 Thread Michael Shuler
If you don't read the dev@ list, I just built the 2.1.19 and 2.2.11 releases to vote on. Testing is always appreciated, so thought I'd let the user@ list know, if you want to give them a whirl prior to release.

Nodetool repair -pr

2017-09-28 Thread Dmitry Buzolin
Hi All, Can someone confirm if "nodetool repair -pr -j2" does run with -inc too? I see the docs mention -inc is set by default, but I am not sure if it is enabled when -pr option is used. Thanks! - To unsubscribe, e-mail:

Re:

2017-09-28 Thread Prem Yadav
Dan, As part of upgrade, did you upgrade the sstables? Sent from mobile. Please excuse typos On 28 Sep 2017 17:45, "Dan Kinder" wrote: > I should also note, I also see nodes become locked up without seeing that > Exception. But the GossipStage buildup does seem correlated

Re: 回复: data loss in different DC

2017-09-28 Thread Reynald Bourtembourg
Do you mean how often we should run repairs in this situation (write at CL=EACH_QUORUM and CL=LOCAL_QUORUM)? I don't consider myself as an expert in this domain, so I will let the real experts answer to this question... You can also refer to this mailing list history (many threads on this

Re:

2017-09-28 Thread Dan Kinder
I should also note, I also see nodes become locked up without seeing that Exception. But the GossipStage buildup does seem correlated with gossip activity, e.g. me restarting a different node. On Thu, Sep 28, 2017 at 9:20 AM, Dan Kinder wrote: > Hi, > > I recently upgraded

[no subject]

2017-09-28 Thread Dan Kinder
Hi, I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the following. The cluster does function, for a while, but then some stages begin to back up and the node does not recover and does not drain the tasks, even under no load. This happens both to MutationStage and GossipStage.

Re: data loss in different DC

2017-09-28 Thread Jeff Jirsa
Your quorum writers are only guaranteed to be on half+1 nodes - there’s no guarantee which nodes those will be. For strong consistency with multiple DCs, You can either: - write at quorum and read at quorum from any dc, or - write each_quorum and read local_quorum from any dc, or - write at

Re: 回复: data loss in different DC

2017-09-28 Thread Jacob Shadix
How often are you running repairs? -- Jacob Shadix On Thu, Sep 28, 2017 at 7:53 AM, Reynald Bourtembourg < reynald.bourtembo...@esrf.fr> wrote: > Hi, > > You can write with CL=EACH_QUORUM and read with CL=LOCAL_QUORUM to get > strong consistency. > > Kind regards, > Reynald > > > On 28/09/2017

Re: 回复: data loss in different DC

2017-09-28 Thread Reynald Bourtembourg
Hi, You can write with CL=EACH_QUORUM and read with CL=LOCAL_QUORUM to get strong consistency. Kind regards, Reynald On 28/09/2017 13:46, Peng Xiao wrote: even with CL=QUORUM,there is no guarantee to be sure to read the same data in DC2,right? then multi DCs looks make no sense?

?????? data loss in different DC

2017-09-28 Thread Peng Xiao
even with CL=QUORUM,there is no guarantee to be sure to read the same data in DC2,right? then multi DCs looks make no sense? -- -- ??: "DuyHai Doan";; : 2017??9??28??(??) 5:45 ??:

Re: data loss in different DC

2017-09-28 Thread Peng Xiao
very sorry for the duplicate mail. -- Original -- From: "";<2535...@qq.com>; Date: Thu, Sep 28, 2017 07:41 PM To: "user"; Subject: data loss in different DC Dear All, We have a cluster with one DC1:RF=3,another DC

data loss in different DC

2017-09-28 Thread Peng Xiao
Dear All, We have a cluster with one DC1:RF=3,another DC DC2:RF=1,DC2 only for ETL,but we found that sometimes we can query records in DC1,while not able not find the same record in DC2 with local_quorum.How it happens?looks data loss in DC2. Could anyone please advise? looks we can only run

Re: data loss in different DC

2017-09-28 Thread DuyHai Doan
If you're writing into DC1 with CL = LOCAL_xxx, there is no guarantee to be sure to read the same data in DC2. Only repair will help you On Thu, Sep 28, 2017 at 11:41 AM, Peng Xiao <2535...@qq.com> wrote: > Dear All, > > We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but >

data loss in different DC

2017-09-28 Thread Peng Xiao
Dear All, We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we found that sometimes we can query records in DC1,while not able not find the same record in DC2 with local_quorum.How it happens? Could anyone please advise? looks we can only run repair to fix it. Thanks,