RE: management and monitoring nodetool repair

2015-10-19 Thread aeljami.ext
Thx Carlos, How can I get information on error during repair ? Thx. De : Carlos Alonso [mailto:i...@mrcalonso.com] Envoyé : lundi 19 octobre 2015 11:09 À : user@cassandra.apache.org Objet : Re: management and monitoring nodetool repair So repair process has two phases: First one is all about

C* 2.1.10 failed to start

2015-10-19 Thread Kai Wang
It seems the same as https://issues.apache.org/jira/browse/CASSANDRA-8544. It started to happen after bulkloading ~100G data and restarting. Windows 2008 R2, JVM 1.8.0_60. It feels like C* didn't shutdown cleanly. Is there any way to workaround this? Thanks.

Re: management and monitoring nodetool repair

2015-10-19 Thread Carlos Alonso
So repair process has two phases: First one is all about calculating Merkel trees and that comparing it with others. This phase can be monitored with nodetool compactionstats Second one is about streaming files of data. That one can be monitored with nodetool netstats. Hope it helps. Cheers!

Re: C* 2.1.10 failed to start

2015-10-19 Thread Kai Wang
I fixed this by deleting everything in system\compactions_in_progress- I wonder if there's any side effects by doing this. On Mon, Oct 19, 2015 at 8:56 AM, Kai Wang wrote: > It seems the same as https://issues.apache.org/jira/browse/CASSANDRA-8544. > It started to happen

Re: management and monitoring nodetool repair

2015-10-19 Thread Carlos Alonso
I'd say the logs will pretty much tell you all you need. You just need to find which is the entity that logs about the repair status (RepairTask.java ?) and once you find it, just tail the logs grepping for that while repair is happening and eventually you'll see the errors as, possibly, java

Re: Read query taking a long time

2015-10-19 Thread Carlos Alonso
Could you send cfhistograms and cfstats relevant to the read column family? That could help Carlos Alonso | Software Engineer | @calonso On 17 October 2015 at 16:15, Brice Figureau < brice+cassan...@daysofwonder.com> wrote: > Hi, > > I've read all I could find on

Re: Read query taking a long time

2015-10-19 Thread Jon Haddad
I wrote a blog post a while back you may find helpful on diagnosing problems in production. There's a lot of potential things that could be wrong with your cluster and going back and forth on the ML to pin down the right one will take forever.

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Sebastian Estevez
The validation compaction part of repair is susceptible to the compaction throttling knob `nodetool getcompactionthroughput` / `nodetool setcompactionthroughput` and you can use that to tune down the resources that are being used by repair. Check out this post by driftx on advanced repair

BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra 2.1, you may have had commitlog_sync: batch commitlog_sync_batch_window_in_ms: 25 in you cassiandra.yaml It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just happened immediately), but fixed in 2.1,

Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Michael Shuler
On 10/19/2015 10:55 AM, Graham Sanderson wrote: If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra 2.1, you may have had commitlog_sync: batch commitlog_sync_batch_window_in_ms: 25 in you cassiandra.yaml It turned out that this was pretty much broken in 2.0 (i.e. fsyncs

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Eric Stevens
It seems to me that as long as cleanup hasn't happened, if you *decommission* the newly joined nodes, they'll stream whatever writes they took back to the original replicas. Presumably that should be pretty quick as they won't have nearly as much data as the original nodes (as they only hold data

[RELEASE] Apache Cassandra 3.0.0-rc2 released

2015-10-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.0.0-rc2. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of

Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
But basically if you were on 2.1.0 thru 2.1.5 you probably couldn’t know to change your config If you were on 2.1.6 thru 2.1.8 you may not have noticed the NEWS.TXT change and changed your config If you are on 2.1.9+ you are probably OK if you are using periodic fsync then you don’t have an

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Jeff Jirsa
Worth noting that repair may not work, as it’s possible that NONE of the nodes with data (for some given row) are no longer valid replicas according to the DHT/Tokens, so repair will not find any of the replicas with the data. From: Robert Coli Reply-To: "user@cassandra.apache.org" Date:

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Robert Coli
On Sun, Oct 18, 2015 at 8:10 PM, Kevin Burton wrote: > ouch.. OK.. I think I really shot myself in the foot here then. This > might be bad. > Yep. https://issues.apache.org/jira/browse/CASSANDRA-7069 - "Prevent operator mistakes due to simultaneous bootstrap" But this

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Robert Coli
On Mon, Oct 19, 2015 at 9:30 AM, Kevin Burton wrote: > I think the point I was trying to make is that on highly loaded boxes, > repair should take lower priority than normal compactions. > You can manually do this by changing the thread priority of compaction threads which

Is there any configuration so that local program on C* node can connect using localhost and remote program using IP/name?

2015-10-19 Thread Ravi
I have two node C* cluster and on these two nodes I want to run spark jobs locally. Inside sparkJob I have to put connection url as localhost so that it will insert data to local C* instance( I am using Cassandra same nodes as spark Job's slaves for execution via Mesos) Problem is if I change

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Robert Coli
On Mon, Oct 19, 2015 at 9:20 AM, Branton Davis wrote: > Is that also true if you're standing up multiple nodes from backups that > already have data? Could you not stand up more than one at a time since > they already have the data? > An operator probably almost

unusual GC log

2015-10-19 Thread 曹志富
INFO [Service Thread] 2015-10-20 10:42:47,854 GCInspector.java:252 - ParNew GC in 476ms. CMS Old Gen: 4288526240 -> 4725514832; Par Eden Space: 671088640 -> 0; INFO [Service Thread] 2015-10-20 10:42:50,870 GCInspector.java:252 - ParNew GC in 423ms. CMS Old Gen: 4725514832 -> 5114687560; Par

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
Yes.. .it's not currently possible :) I think it should be. Say the IO on your C* is at 60% utilization. If you do a repair, this would require 120% utilization obviously not possible, so now your app is down / offline until the repair finishes. If you could throttle repair separately this

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Branton Davis
Is that also true if you're standing up multiple nodes from backups that already have data? Could you not stand up more than one at a time since they already have the data? On Mon, Oct 19, 2015 at 10:48 AM, Eric Stevens wrote: > It seems to me that as long as cleanup hasn't

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
I think the point I was trying to make is that on highly loaded boxes, repair should take lower priority than normal compactions. Having a throttle on *both* doesn't solve the problem. So I need a setcompactionthroughput and a setrepairthroughput and total througput would be the sum of

Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
- commitlog_sync_batch_window_in_ms behavior has changed from the maximum time to wait between fsync to the minimum time. We are working on making this more user-friendly (see CASSANDRA-9533) but in the meantime, this means 2.1 needs a much smaller batch window to keep writer threads