Re: OOM at Bootstrap Time

2014-10-26 Thread DuyHai Doan
Hello Maxime Can you put the complete logs and config somewhere ? It would be interesting to know what is the cause of the OOM. On Sun, Oct 26, 2014 at 3:15 AM, Maxime maxim...@gmail.com wrote: Thanks a lot that is comforting. We are also small at the moment so I definitely can relate with

Re: OOM at Bootstrap Time

2014-10-26 Thread Maxime
I've emailed you a raw log file of an instance of this happening. I've been monitoring more closely the timing of events in tpstats and the logs and I believe this is what is happening: - For some reason, C* decides to provoke a flush storm (I say some reason, I'm sure there is one but I have

Re: OOM at Bootstrap Time

2014-10-26 Thread DuyHai Doan
Hello Maxime Increasing the flush writers won't help if your disk I/O is not keeping up. I've had a look into the log file, below are some remarks: 1) There are a lot of SSTables on disk for some tables (events for example, but not only). I've seen that some compactions are taking up to 32

Re: which snitch ?

2014-10-26 Thread Srinivas Chamarthi
what about for the nodes on the private cloud cluster ? if I mention, ec2MultiRegion, it is failing since it is trying to invoking aws api on the node inside the snitch. should I mention GossipingPropertyFileSnitch ? I am not sure if I can mix and match. can someone advise me ? thx srinivas On

Re: OOM at Bootstrap Time

2014-10-26 Thread Maxime
Thank you very much for your reply. This is a deeper interpretation of the logs than I can do at the moment. Regarding 2) it's a good assumption on your part but in this case, non-obviously the loc table's primary key is actually not id, the scheme changed historically which has led to this odd

Re: OOM at Bootstrap Time

2014-10-26 Thread Jonathan Haddad
If the issue is related to I/O, you're going to want to determine if you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz (queue size) and svctm, (service time).The higher those numbers are, the most overwhelmed your disk is. On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan

Re: which snitch ?

2014-10-26 Thread Colin
I would try propertyfilesnitch and use the public ip's of the nodes in aws. You'll need to set the configuration files on each node. On Oct 26, 2014, at 9:44 PM, Srinivas Chamarthi srinivas.chamar...@gmail.com wrote: what about for the nodes on the private cloud cluster ? if I mention,

decommissioning a cassandra node

2014-10-26 Thread Tim Dunphy
Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/

Re: OOM at Bootstrap Time

2014-10-26 Thread DuyHai Doan
Should doing a major compaction on those nodes lead to a restructuration of the SSTables? -- Beware of the major compaction on SizeTiered, it will create 2 giant SSTables and the expired/outdated/tombstone columns in this big file will be never cleaned since the SSTable will never get a chance to

Re: OOM at Bootstrap Time

2014-10-26 Thread Maxime
Hmm, thanks for the reading. I initially followed some (perhaps too old) maintenance scripts, which included weekly 'nodetool compact'. Is there a way for me to undo the damage? Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily.