|trickle_fsync| has been enabled for long time in our settings (just noticed):
trickle_fsync: true trickle_fsync_interval_in_kb: 10240 On Thu, Feb 19, 2015 at 12:12 PM, Michał Łowicki <mlowi...@gmail.com> wrote: > > > On Thu, Feb 19, 2015 at 11:02 AM, Carlos Rolo <r...@pythian.com> wrote: > >> Do you have trickle_fsync enabled? Try to enable that and see if it >> solves your problem, since you are getting out of non-heap memory. >> >> Another question, is always the same nodes that die? Or is 2 out of 4 >> that die? >> > > Always the same nodes. Upgraded to 2.1.3 two hours ago so we'll monitor if > maybe issue has been fixed there. If not will try to enable |tricke_fsync| > > >> >> Regards, >> >> Carlos Juzarte Rolo >> Cassandra Consultant >> >> Pythian - Love your data >> >> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo >> <http://linkedin.com/in/carlosjuzarterolo>* >> Tel: 1649 >> www.pythian.com >> >> On Thu, Feb 19, 2015 at 10:49 AM, Michał Łowicki <mlowi...@gmail.com> >> wrote: >> >>> >>> >>> On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo <r...@pythian.com> wrote: >>> >>>> So compaction doesn't seem to be your problem (You can check with >>>> nodetool compactionstats just to be sure). >>>> >>> >>> pending tasks: 0 >>> >>> >>>> >>>> How much is your write latency on your column families? I had OOM >>>> related to this before, and there was a tipping point around 70ms. >>>> >>> >>> Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter. >>> >>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> BR, >>> Michał Łowicki >>> >> >> >> -- >> >> >> >> > > > -- > BR, > Michał Łowicki > -- BR, Michał Łowicki