it could also be CASSANDRA-11412 if you have many sstables and vnodes On Wed, Jun 22, 2016 at 2:50 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
> Thanks for the info Paulo, Robert. I tried further testing with other > parameters and it was prevalent. We could be either 11739, 11206. But im > spektical about 11739 because repair works well in 3.5 and 11739 seems to > be fixed for 3.7/3.0.7. > > We may possibly resolve this by increasing heap size thereby reducing some > page cache bandwidth before upgrading to higher versions. > > On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta <pauloricard...@gmail.com> > wrote: > >> You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and >> could potentially cause OOMs for long-running repairs. >> >> >> 2016-06-20 13:26 GMT-03:00 Robert Stupp <sn...@snazy.de>: >> >>> One possibility might be CASSANDRA-11206 (Support large partitions on >>> the 3.0 sstable format), which reduces heap usage for other operations >>> (like repair, compactions) as well. >>> You can verify that by setting column_index_cache_size_in_kb in c.yaml >>> to a really high value like 10000000 - if you see the same behaviour in 3.7 >>> with that setting, there’s not much you can do except upgrading to 3.7 as >>> that change went into 3.6 and not into 3.0.x. >>> >>> — >>> Robert Stupp >>> @snazy >>> >>> On 20 Jun 2016, at 18:13, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: >>> >>> Hi All, >>> >>> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB. >>> There has been a consistent issue with nodetool repair for a while and >>> we have tried issuing it with multiple options --pr, --local as well, >>> sometimes node went down with Out of Memory error and at times nodes did >>> stopped connecting any connection, even jmx nodetool commands. >>> >>> On trying with same data on 3.7 Repair Ran successfully without >>> encountering any of the above mentioned issues. I then tried increasing >>> heap to 16GB on 3.0.3 and repair ran successfully. >>> >>> I then analyzed memory usage during nodetool repair for 3.0.3(16GB >>> heap) vs 3.7 (8GB Heap) and 3.0.3 occupied 11-14 GB at all times, >>> whereas 3.7 spiked between 1-4.5 GB while repair runs. As they ran on >>> same dataset and unrepaired data with full repair. >>> >>> We would like to know if it is a known bug that was fixed post 3.0.3 and >>> there could be a possible way by which we can run repair on 3.0.3 without >>> increasing heap size as for all other activities 8GB works for us. >>> >>> PFA the visualvm snapshots. >>> >>> <Screenshot from 2016-06-20 21:06:09.png> >>> 3.0.3 VisualVM Snapshot, consistent heap usage of greater than 12 GB. >>> >>> >>> <Screenshot from 2016-06-20 21:05:57.png> >>> 3.7 VisualVM Snapshot, 8GB Max Heap and max heap usage till about 5GB. >>> >>> Thanks & Regards, >>> Bhuvan Rawal >>> >>> >>> PS: In case if the snapshots are not visible, they can be viewed from >>> the following links: >>> 3.0.3: >>> https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.png >>> 3.7: >>> https://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png >>> >>> >>> >> >