cc'ing user back in... On 12 Aug. 2017 01:55, "kurt greaves" <k...@instaclustr.com> wrote:
> How much memory do these machines have? Typically we've found that G1 > isn't worth it until you get to around 24G heaps, and even at that it's not > really better than CMS. You could try CMS with an 8G heap and 2G new size. > > However as the oom is only happening on one node have you ensured there > are no extra processes running on that node that could be consuming extra > memory? Note that the oom killer will kill the process with the highest oom > score, which generally corresponds to the process using the most memory, > but not necessarily the problem. > > Also could you run nodetool info on the problem node and 1 other and dump > the output in a gist? It would be interesting to see if there is a > significant difference in off-heap. > > On 11 Aug. 2017 17:30, "Micha" <mich...@fantasymail.de> wrote: > >> It's an oom issue, the kernel kills the cassandra job. >> The config was to use offheap buffers and 20G java heap, I changed this >> to use heap buffers and 16G java heap. I added a new node yesterday >> which got streams from 4 other nodes. They all succeeded except on the >> one node which failed before. This time again the db was killed by the >> kernel. At the moment I don't know what is the reason here, since the >> nodes are equal. >> >> For me it seems the g1gc is not able to free the memory fast enough. >> The settings were for MaxGCPauseMillis=600 and ParallelGCThreads=10 >> ConcGCThreads=10 which maybe are too high since the node has only 8 >> cores.. >> I changed this ParallelGCThreads=8 and ConcGCThreads=2 as is mentioned >> in the comments of jvm.options >> >> Since the bootstrap of the fifth node did not complete I will start it >> again and check if the memory is still decreasing over time. >> >> >> >> Michael >> >> >> >> On 11.08.2017 01:25, Jeff Jirsa wrote: >> > >> > >> > On 2017-08-08 01:00 (-0700), Micha <mich...@fantasymail.de> wrote: >> >> Hi, >> >> >> >> it seems I'm not able to add add 3 node dc to a 3 node dc. After >> >> starting the rebuild on a new node, nodetool netstats show it will >> >> receive 1200 files from node-1 and 5000 from node-2. The stream from >> >> node-1 completes but the stream from node-2 allways fails, after >> sending >> >> ca 4000 files. >> >> >> >> After restarting the rebuild it again starts to send the 5000 files. >> >> The whole cluster is connected via one switch only , no firewall >> >> between, the networks shows no errors. >> >> The machines have 8 cores, 32GB RAM and two 1TB discs as raid0. >> >> the logs show no errors. The size of the data is ca 1TB. >> > >> > Is there anything in `dmesg` ? System logs? Nothing? Is node2 running? >> Is node3 running? >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: dev-h...@cassandra.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >>