Re: Node bootstrap

2014-08-12 Thread Ruchir Jha
Still having issues with node bootstrapping. The new node just died, because it Full Gced, the nodes it had actual streams with noticed its down. After the full gc finished the new node printed this log : ERROR 02:52:36,259 Stream failed because /10.10.20.35 died or was restarted/removed (streams

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Thanks Patricia for your response! On the new node, I just see a lot of the following: INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400) Writing Memtable INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java (line 262) Compacted 12 sstables to so basically

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote: My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also not sure if this is relevant but just noticed the nodetool tpstats output: Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are

Re: Node bootstrap

2014-08-05 Thread Mark Reddy
Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. Ok so you have num_tokens set to 256 for all nodes with initial_token commented out, this means you are using vnodes and the new node will automatically grab a list of tokens to take over responsibility

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
nodetool status: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.10.20.27 1.89 TB256 25.4% 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1 UN

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also Mark to your comment on my tpstats output, below is my iostat output, and the iowait is at 4.59%, which means no IO pressure, but we are still seeing the bad flush performance. Should we try increasing the flush writers? Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014

Re: Node bootstrap

2014-08-05 Thread Mark Reddy
Hi Ruchir, With the large number of blocked flushes and the number of pending compactions would still indicate IO contention. Can you post the output of 'iostat -x 5 5' If you do in fact have spare IO, there are several configuration options you can tune such as increasing the number of flush

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is set to 0, which I believe disables throttling. Also, Here is the iostat -x 5 5 output: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 10.00

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also, right now the top command shows that we are at 500-700% CPU, and we have 23 total processors, which means we have a lot of idle CPU left over, so throwing more threads at compaction and flush should alleviate the problem? On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com

Re: Node bootstrap

2014-08-04 Thread Patricia Gorla
Ruchir, What exactly are you seeing in the logs? Are you running major compactions on the new bootstrapping node? With respect to the seed list, it is generally advisable to use 3 seed nodes per AZ / DC. Cheers, On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote: I am