Re: Open source equivalents of OpsCenter
My experience while looking for a replacement on https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063 <https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063#.icv7eukko> <https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063#.icv7eukko> On Thursday, 14 July 2016, Stefano Ortolani <ostef...@gmail.com> wrote: > Replaced OpsCenter with a mix of: > > * metrics-graphite-3.1.0.jar installed in the same classpath of C* > * Custom script to push system metrics (cpu/mem/io) > * Grafana to create the dashboard > * Custom repairs script > > Still not optimal but getting there... > > Stefano > > On Thu, Jul 14, 2016 at 10:18 AM, Romain Hardouin <romainh...@yahoo.fr > <javascript:_e(%7B%7D,'cvml','romainh...@yahoo.fr');>> wrote: > >> Hi Juho, >> >> Out of curiosity, which stack did you use to make your dashboard? >> >> Romain >> >> Le Jeudi 14 juillet 2016 10h43, Juho Mäkinen <juho.maki...@gmail.com >> <javascript:_e(%7B%7D,'cvml','juho.maki...@gmail.com');>> a écrit : >> >> >> I'm doing some work on replacing OpsCenter in out setup. I ended creating >> a Docker container which contains the following features: >> >> - Cassandra 2.2.7 >> - MX4J (a JMX to REST bridge) as a java-agent >> - metrics-graphite-3.1.0.jar (export some but not all JMX to graphite) >> - a custom ruby which uses MX4J to export some JMX metrics to graphite >> which we don't otherwise get. >> >> With this I will get all our cassandra instances and their JMX exposed >> data to graphite, which allows us to use Grafana and Graphite to draw >> pretty dashboards. >> >> In addition I started writing some code which currently provides the >> following features: >> - A dashboard which provides a similar ring view what OpsCenter does, >> with onMouseOver features to display more info on each node. >> - Simple HTTP GET/POST based api to do >> - Setup a new non-vnode based cluster >> - Get a JSON blob on cluster information, all its tokens, machines >> and so on >> - Api for new cluster instances so that they can get a token slot >> from the ring when they boot. >> - Option to kill a dead node and mark its slot for replace, so the >> new booting node can use cassandra.replace_address option. >> >> The node is not yet packaged in any way for distribution and some parts >> depend on our Chef installation, but if there's interest I can publish at >> least some parts from it. >> >> - Garo >> >> On Thu, Jul 14, 2016 at 10:54 AM, Romain Hardouin <romainh...@yahoo.fr >> <javascript:_e(%7B%7D,'cvml','romainh...@yahoo.fr');>> wrote: >> >> Do you run C* on physical machine or in the cloud? If the topology >> doesn't change too often you can have a look a Zabbix. The downside is that >> you have to set up all the JMX metrics yourself... but that's also a good >> point because you can have custom metrics. If you want nice >> graphs/dashboards you can use Grafana to plot Zabbix data. (We're also >> using SaaS but that's not open source). >> For the rolling restart and other admin stuff we're using Rundeck. It's a >> great tool when working in a team. >> >> (I think it's time to implement an open source alternative to OpsCenter. >> If some guys are interested I'm in.) >> >> Best, >> >> Romain >> >> >> >> >> Le Jeudi 14 juillet 2016 0h01, Ranjib Dey <dey.ran...@gmail.com >> <javascript:_e(%7B%7D,'cvml','dey.ran...@gmail.com');>> a écrit : >> >> >> we use datadog (metrics emitted as raw statsd) for the dashboard. All >> repair & compaction is done via blender & serf[1]. >> [1]https://github.com/pagerduty/blender >> >> >> On Wed, Jul 13, 2016 at 2:42 PM, Kevin O'Connor <ke...@reddit.com >> <javascript:_e(%7B%7D,'cvml','ke...@reddit.com');>> wrote: >> >> Now that OpsCenter doesn't work with open source installs, are there any >> runs at an open source equivalent? I'd be more interested in looking at >> metrics of a running cluster and doing other tasks like managing >> repairs/rolling restarts more so than historical data. >> >> >> >> >> >> >> >> > -- BR, Michał Łowicki
Re: Cassandra monitoring
My team ended up with Diamond / StatsD / Graphite / Grafana (more background in medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063). We're relying on such stack heavily in other projects and our infra in general. On Tue, Jun 14, 2016 at 10:29 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > OpsCenter going forward is limited to datastax enterprise versions. > > I know a lot of people like DataDog, but I haven't used it. Maybe other > people on the list can speak from recent first hand experience on it's pros > and cons. > > > On Tue, Jun 14, 2016 at 1:20 PM Arun Ramakrishnan < > sinchronized.a...@gmail.com> wrote: > >> Thanks Jonathan. >> >> Out of curiosity, does opscenter support some later version of cassandra >> that is not OSS ? >> >> Well, the most minimal requirement is that, I want to be able to monitor >> for cluster health and hook this info to some alerting platform. We are AWS >> heavy. We just really heavily on AWS cloud watch for our metrics as of now. >> We prefer to not spend our time setting up additional tools if we can help >> it. So, if we needed a 3rd party service we would consider an APM or >> monitoring service that is on the cheaper side. >> >> >> >> >> On Tue, Jun 14, 2016 at 12:20 PM, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >> >>> Depends what you want to monitor. I wouldn't use a lesser version of >>> Cassandra for OpsCenter, it doesn't give you a ton you can't get elsewhere >>> and it's not ever going to support OSS > 2.1, so you kind of limit yourself >>> to a pretty old version of Cassandra for a non-good reason. >>> >>> What else do you use for monitoring in your infra? I've used a mix of >>> OSS tools (nagios, statsd, graphite, ELK), and hosted solutions. The nice >>> part about them is that you can monitor your whole stack in a single UI not >>> just your database. >>> >>> On Tue, Jun 14, 2016 at 12:10 PM Arun Ramakrishnan < >>> sinchronized.a...@gmail.com> wrote: >>> >>>> What are the options for a very small and nimble startup to do keep a >>>> cassandra cluster running well oiled. We are on AWS. We are interested in a >>>> monitoring tool and potentially also cluster management tools. >>>> >>>> We are currently on apache cassandra 3.7. We were hoping the datastax >>>> opscenter would be it (It is free for startups our size). But, looks like >>>> it does not support cassandra versions greater than v2.1. It is pretty >>>> surprising considering cassandra v2.1 came out in 2014. >>>> >>>> We would consider downgrading to datastax cassandra 2.1 just to have >>>> robust monitoring tools. But, I am not sure if having opscenter offsets all >>>> the improvements that have been added to cassandra since 2.1. >>>> >>>> Sematext has a integrations for monitoring cassandra. Does anyone have >>>> good experience with it ? >>>> >>>> How much work would be involved to setup Ganglia or some such option >>>> for cassandra ? >>>> >>>> Thanks, >>>> Arun >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> -- BR, Michał Łowicki
Re: Replacing disks
On Mon, Feb 29, 2016 at 8:52 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > I wrote that a few days ago: > http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html > > I believe this might help you. > Yes, looks promising. Thanks! > C*heers, > --- > > Alain Rodriguez - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http <http://www.thelastpickle.com/>:// <http://www.thelastpickle.com/> > www.thelastpickle.com > Le 28 févr. 2016 15:17, "Clint Martin" < > clintlmar...@coolfiretechnologies.com> a écrit : > >> Code wise, I am not completely familiar with what accomplishes the >> behavior. But my understanding and experience is that Cass 2.1 picks the >> drive with the most free space when picking a destination for a compaction >> operation. >> (This is an overly simplistic description. Reality is always more >> nuanced. datastax had a blog post that describes this better as well as >> limitations to the algorithm in 2.1 which are addressed in the 3.x releases >> ) >> >> Clint >> On Feb 28, 2016 10:11 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote: >> >>> >>> >>> On Sun, Feb 28, 2016 at 4:00 PM, Clint Martin < >>> clintlmar...@coolfiretechnologies.com> wrote: >>> >>>> Your plan for replacing your 200gb drive sounds good to me. Since you >>>> are running jbod, I wouldn't worry about manually redistributing data from >>>> your other disk to the new one. Cassandra will do that for you as it >>>> performs compaction. >>>> >>> >>> Is this done by pickWriteableDirectory >>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L386> >>> ? >>> >>>> While you're doing the drive change, you need to complete the swap and >>>> restart of the node before the hinted handoff window expires on the other >>>> nodes. If you do not complete in time, you'll want to perform a repair on >>>> the node. >>>> >>> >>> Yes. Thanks! >>> >>> >>>> >>>> >>>> Clint >>>> On Feb 28, 2016 9:33 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I've two disks on single box (500GB + 200GB). data_file_directories >>>>> in cassandra.yaml has two entries. I would like to replace 200GB with >>>>> 500GB >>>>> as it's running out of space and to align it with others we've in the >>>>> cluster. The plan is to stop C*, attach new disk, move data from 200GB to >>>>> new one and mount it at the same point in the hierarchy. When done start >>>>> C*. >>>>> >>>>> Additionally I would like to move some data from the old 500GB to the >>>>> new one to distribute used disk space equally. Probably all related files >>>>> for single SSTable should be moved i.e. >>>>> >>>>> foo-bar-ka-1630184-CompressionInfo.db >>>>> >>>>> foo-bar-ka-1630184-Data.db >>>>> >>>>> foo-bar-ka-1630184-Digest.sha1 >>>>> >>>>> foo-bar-ka-1630184-Filter.db >>>>> >>>>> foo-bar-ka-1630184-Index.db >>>>> >>>>> foo-bar-ka-1630184-Statistics.db >>>>> >>>>> foo-bar-ka-1630184-Summary.db >>>>> >>>>> foo-bar-ka-1630184-TOC.txt >>>>> >>>>> Is this something which should work or you see some obstacles? (C* >>>>> 2.1.13). >>>>> -- >>>>> BR, >>>>> Michał Łowicki >>>>> >>>> >>> >>> >>> -- >>> BR, >>> Michał Łowicki >>> >> -- BR, Michał Łowicki
Re: Replacing disks
On Sun, Feb 28, 2016 at 4:00 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > Your plan for replacing your 200gb drive sounds good to me. Since you are > running jbod, I wouldn't worry about manually redistributing data from your > other disk to the new one. Cassandra will do that for you as it performs > compaction. > Is this done by pickWriteableDirectory <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L386> ? > While you're doing the drive change, you need to complete the swap and > restart of the node before the hinted handoff window expires on the other > nodes. If you do not complete in time, you'll want to perform a repair on > the node. > Yes. Thanks! > > > Clint > On Feb 28, 2016 9:33 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote: > >> Hi, >> >> I've two disks on single box (500GB + 200GB). data_file_directories in >> cassandra.yaml has two entries. I would like to replace 200GB with 500GB as >> it's running out of space and to align it with others we've in the cluster. >> The plan is to stop C*, attach new disk, move data from 200GB to new one >> and mount it at the same point in the hierarchy. When done start C*. >> >> Additionally I would like to move some data from the old 500GB to the new >> one to distribute used disk space equally. Probably all related files for >> single SSTable should be moved i.e. >> >> foo-bar-ka-1630184-CompressionInfo.db >> >> foo-bar-ka-1630184-Data.db >> >> foo-bar-ka-1630184-Digest.sha1 >> >> foo-bar-ka-1630184-Filter.db >> >> foo-bar-ka-1630184-Index.db >> >> foo-bar-ka-1630184-Statistics.db >> >> foo-bar-ka-1630184-Summary.db >> >> foo-bar-ka-1630184-TOC.txt >> >> Is this something which should work or you see some obstacles? (C* >> 2.1.13). >> -- >> BR, >> Michał Łowicki >> > -- BR, Michał Łowicki
Replacing disks
Hi, I've two disks on single box (500GB + 200GB). data_file_directories in cassandra.yaml has two entries. I would like to replace 200GB with 500GB as it's running out of space and to align it with others we've in the cluster. The plan is to stop C*, attach new disk, move data from 200GB to new one and mount it at the same point in the hierarchy. When done start C*. Additionally I would like to move some data from the old 500GB to the new one to distribute used disk space equally. Probably all related files for single SSTable should be moved i.e. foo-bar-ka-1630184-CompressionInfo.db foo-bar-ka-1630184-Data.db foo-bar-ka-1630184-Digest.sha1 foo-bar-ka-1630184-Filter.db foo-bar-ka-1630184-Index.db foo-bar-ka-1630184-Statistics.db foo-bar-ka-1630184-Summary.db foo-bar-ka-1630184-TOC.txt Is this something which should work or you see some obstacles? (C* 2.1.13). -- BR, Michał Łowicki
Re: Increase compaction performance
I had to decrease streaming throughput to 10 (from default 200) in order to avoid effect or rising number of SSTables and number of compaction tasks while running repair. It's working very slow but it's stable and doesn't hurt the whole cluster. Will try to adjust configuration gradually to see if can make it any better. Thanks! On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki <mlowi...@gmail.com> wrote: > > > On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > >> Also, are you using incremental repairs (not sure about the available >> options in Spotify Reaper) what command did you run ? >> >> > No. > > >> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>: >> >>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses >>> >>> >>> >>> What is your current compaction throughput ? The current value of >>> 'concurrent_compactors' (cassandra.yaml or through JMX) ? >>> >> > > Throughput was initially set to 1024 and I've gradually increased it to > 2048, 4K and 16K but haven't seen any changes. Tried to change it both from > `nodetool` and also cassandra.yaml (with restart after changes). > > >> >>> nodetool getcompactionthroughput >>> >>> How to speed up compaction? Increased compaction throughput and >>>> concurrent compactors but no change. Seems there is plenty idle >>>> resources but can't force C* to use it. >>>> >>> >>> You might want to try un-throttle the compaction throughput through: >>> >>> nodetool setcompactionsthroughput 0 >>> >>> Choose a canari node. Monitor compaction pending and disk throughput >>> (make sure server is ok too - CPU...) >>> >> > > Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit > sceptical about it. > > >> >>> Some other information could be useful: >>> >>> What is your number of cores per machine and the compaction strategies >>> for the 'most compacting' tables. What are write/update patterns, any TTL >>> or tombstones ? Do you use a high number of vnodes ? >>> >> > I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to > 256. > > Using LCS for all tables. Write / update heavy. No warnings about large > number of tombstones but we're removing items frequently. > > > >> >>> Also what is your repair routine and your values for gc_grace_seconds ? >>> When was your last repair and do you think your cluster is suffering of a >>> high entropy ? >>> >> > We're having problem with repair for months (CASSANDRA-9935). > gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it > successfully for long time I guess cluster is suffering of high entropy. > > >> >>> You can lower the stream throughput to make sure nodes can cope with >>> what repairs are feeding them. >>> >>> nodetool getstreamthroughput >>> nodetool setstreamthroughput X >>> >> > Yes, this sounds interesting. As we're having problem with repair for > months it could that lots of things are transferred between nodes. > > Thanks! > > >> >>> C*heers, >>> >>> - >>> Alain Rodriguez >>> France >>> >>> The Last Pickle >>> http://www.thelastpickle.com >>> >>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <mlowi...@gmail.com>: >>> >>>> Hi, >>>> >>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair >>>> using Cassandra Reaper but nodes after couple of hours are full of pending >>>> compaction tasks (regular not the ones about validation) >>>> >>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses. >>>> >>>> How to speed up compaction? Increased compaction throughput and >>>> concurrent compactors but no change. Seems there is plenty idle >>>> resources but can't force C* to use it. >>>> >>>> Any clue where there might be a bottleneck? >>>> >>>> >>>> -- >>>> BR, >>>> Michał Łowicki >>>> >>>> >>> >> > > > -- > BR, > Michał Łowicki > -- BR, Michał Łowicki
Re: Increase compaction performance
On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Also, are you using incremental repairs (not sure about the available > options in Spotify Reaper) what command did you run ? > > No. > 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>: > >> CPU load is fine, SSD disks below 30% utilization, no long GC pauses >> >> >> >> What is your current compaction throughput ? The current value of >> 'concurrent_compactors' (cassandra.yaml or through JMX) ? >> > Throughput was initially set to 1024 and I've gradually increased it to 2048, 4K and 16K but haven't seen any changes. Tried to change it both from `nodetool` and also cassandra.yaml (with restart after changes). > >> nodetool getcompactionthroughput >> >> How to speed up compaction? Increased compaction throughput and >>> concurrent compactors but no change. Seems there is plenty idle >>> resources but can't force C* to use it. >>> >> >> You might want to try un-throttle the compaction throughput through: >> >> nodetool setcompactionsthroughput 0 >> >> Choose a canari node. Monitor compaction pending and disk throughput >> (make sure server is ok too - CPU...) >> > Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit sceptical about it. > >> Some other information could be useful: >> >> What is your number of cores per machine and the compaction strategies >> for the 'most compacting' tables. What are write/update patterns, any TTL >> or tombstones ? Do you use a high number of vnodes ? >> > I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to 256. Using LCS for all tables. Write / update heavy. No warnings about large number of tombstones but we're removing items frequently. > >> Also what is your repair routine and your values for gc_grace_seconds ? >> When was your last repair and do you think your cluster is suffering of a >> high entropy ? >> > We're having problem with repair for months (CASSANDRA-9935). gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it successfully for long time I guess cluster is suffering of high entropy. > >> You can lower the stream throughput to make sure nodes can cope with what >> repairs are feeding them. >> >> nodetool getstreamthroughput >> nodetool setstreamthroughput X >> > Yes, this sounds interesting. As we're having problem with repair for months it could that lots of things are transferred between nodes. Thanks! > >> C*heers, >> >> - >> Alain Rodriguez >> France >> >> The Last Pickle >> http://www.thelastpickle.com >> >> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <mlowi...@gmail.com>: >> >>> Hi, >>> >>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair >>> using Cassandra Reaper but nodes after couple of hours are full of pending >>> compaction tasks (regular not the ones about validation) >>> >>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses. >>> >>> How to speed up compaction? Increased compaction throughput and >>> concurrent compactors but no change. Seems there is plenty idle >>> resources but can't force C* to use it. >>> >>> Any clue where there might be a bottleneck? >>> >>> >>> -- >>> BR, >>> Michał Łowicki >>> >>> >> > -- BR, Michał Łowicki
Increase compaction performance
Hi, Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair using Cassandra Reaper but nodes after couple of hours are full of pending compaction tasks (regular not the ones about validation) CPU load is fine, SSD disks below 30% utilization, no long GC pauses. How to speed up compaction? Increased compaction throughput and concurrent compactors but no change. Seems there is plenty idle resources but can't force C* to use it. Any clue where there might be a bottleneck? -- BR, Michał Łowicki
Much less connected native clients after node join
Hi, I'm using Python Driver 2.7.2 connected to C* 2.1.11 cluster in two DCs. I had to reboot and rejoin one node and noticed that after successful join the number of connected native clients was much less than to other nodes (blue line on the attached graph). It didn't fixed after many hours so I restarted newly joined node ~9:50 and everything looked much better. I guess expected behaviour would be to have same number connected clients after some time. -- BR, Michał Łowicki
compaction became super slow after interrupted repair
Hi, Running C* 2.1.8 cluster in two data centers with 6 nodes each. I've started running repair sequentially on each node (`nodetool repair --parallel --in-local-dc`). While running repair number of SSTables grows radically as well as pending compaction tasks. It's fine as node usually recovers within couple of hours after finishing repair ( https://www.dropbox.com/s/xzcndf5596mq7rm/Screenshot%202015-09-26%2016.17.44.png?dl=0). One experiment showed that increasing compaction throughput and number of compactors mitigates this problem. Unfortunately one node didn't recovered... ( https://www.dropbox.com/s/nphnsaf2rbfm0bq/Screenshot%202015-09-26%2016.20.56.png?dl=0). I needed to interrupt repair as node was running out of disk space. I hoped that within couple of hours node will catch up with compaction but it didn't happen even after 5 days. I've tried to increase throughput, disable throttling, increasing number of compactors, disabling binary / thrift / gossip, increasing heap size, restarting but still compaction is super slow. Tried today to run scrub: root@db2:~# nodetool scrub sync Aborted scrubbing atleast one column family in keyspace sync, check server logs for more information. error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) as well as cleanup: root@db2:~# nodetool cleanup Aborted cleaning up atleast one column family in keyspace sync, check server logs for more information. error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) Couldn't find anything in logs regarding these runtime exceptions (see log here - https://www.dropbox.com/s/flmii7fgpyp07q2/db2.lati.system.log?dl=0). Note that I'm experiencing CASSANDRA-9935 while running repair on each node from the cluster. Any help will be much appreciated. -- BR, Michał Łowicki
Re: Garbage collector launched on all nodes at once
Looks that memtable heap size is growing on some nodes rapidly ( https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0). Drops are the places when nodes have been restarted. On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki -- BR, Michał Łowicki
Garbage collector launched on all nodes at once
Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki
Re: How to interpret some GC logs
On Tue, Jun 2, 2015 at 9:06 AM, Sebastian Martinka sebastian.marti...@mercateo.com wrote: this should help you: https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs I don't see there such format. Passed options related to GC are: -XX:+PrintGCDateStamps -Xloggc:/var/log/cassandra/gc.log Best Regards, Sebastian Martinka *Von:* Michał Łowicki [mailto:mlowi...@gmail.com] *Gesendet:* Montag, 1. Juni 2015 11:47 *An:* user@cassandra.apache.org *Betreff:* How to interpret some GC logs Hi, Normally I get logs like: 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K), 0.0494560 secs] which is fine and understandable but occasionalIy I see something like: 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600 secs] How to interpret it? Does it miss only part before - so memory occupied before GC cycle? -- BR, Michał Łowicki -- BR, Michał Łowicki
Re: How to interpret some GC logs
On Mon, Jun 1, 2015 at 7:25 PM, Jason Wee peich...@gmail.com wrote: can you tell what jvm is that? root@db2:~# java -version java version 1.7.0_80 Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) jason On Mon, Jun 1, 2015 at 5:46 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Normally I get logs like: 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K), 0.0494560 secs] which is fine and understandable but occasionalIy I see something like: 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600 secs] How to interpret it? Does it miss only part before - so memory occupied before GC cycle? -- BR, Michał Łowicki -- BR, Michał Łowicki
How to interpret some GC logs
Hi, Normally I get logs like: 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K), 0.0494560 secs] which is fine and understandable but occasionalIy I see something like: 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600 secs] How to interpret it? Does it miss only part before - so memory occupied before GC cycle? -- BR, Michał Łowicki
Compaction freezes
Hi, Using C* 2.1.5 and table with leveled compaction I've found that number of pending tasks is around 100. Turned out that one table cannot be compacted and it always stops at the same point (more or less): Compaction sync entity_by_id 4.91 GB 12.34 GB bytes 39.79% `service cassandra restart` Compaction sync entity_by_id 4.91 GB 12.35 GB bytes 39.74% `service cassandra restart` Compaction sync entity_by_id 4.9 GB 12.33 GB bytes 39.77% `service cassandra restart` Compaction sync entity_by_id 4.89 GB 12.32 GB bytes 39.73% After doubling heap size (cassandra-env.sh) compaction went fine but I'm considering this change as a temporary solution and after a while compaction started to freeze again anyway. Is there any way to get insight into compaction so to get answer why it freezes and basically what compactor is doing currently? I've enabled DEBUG logging but it's too verbose as node is getting some traffic. Can I enable DEBUG for compaction only? -- BR, Michał Łowicki
Re: C* 2.1.2 invokes oom-killer
After couple of days it's still behaving fine. Case closed. On Thu, Feb 19, 2015 at 11:15 PM, Michał Łowicki mlowi...@gmail.com wrote: Upgrade to 2.1.3 seems to help so far. After ~12 hours total memory consumption grew from 10GB to 10.5GB. On Thu, Feb 19, 2015 at 2:02 PM, Carlos Rolo r...@pythian.com wrote: Then you are probably hitting a bug... Trying to find out in Jira. The bad news is the fix is only to be released on 2.1.4. Once I find it out I will post it here. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Tel: 1649 www.pythian.com On Thu, Feb 19, 2015 at 12:16 PM, Michał Łowicki mlowi...@gmail.com wrote: |trickle_fsync| has been enabled for long time in our settings (just noticed): trickle_fsync: true trickle_fsync_interval_in_kb: 10240 On Thu, Feb 19, 2015 at 12:12 PM, Michał Łowicki mlowi...@gmail.com wrote: On Thu, Feb 19, 2015 at 11:02 AM, Carlos Rolo r...@pythian.com wrote: Do you have trickle_fsync enabled? Try to enable that and see if it solves your problem, since you are getting out of non-heap memory. Another question, is always the same nodes that die? Or is 2 out of 4 that die? Always the same nodes. Upgraded to 2.1.3 two hours ago so we'll monitor if maybe issue has been fixed there. If not will try to enable |tricke_fsync| Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Tel: 1649 www.pythian.com On Thu, Feb 19, 2015 at 10:49 AM, Michał Łowicki mlowi...@gmail.com wrote: On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com wrote: So compaction doesn't seem to be your problem (You can check with nodetool compactionstats just to be sure). pending tasks: 0 How much is your write latency on your column families? I had OOM related to this before, and there was a tipping point around 70ms. Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter. -- -- BR, Michał Łowicki -- -- BR, Michał Łowicki -- BR, Michał Łowicki -- -- BR, Michał Łowicki -- BR, Michał Łowicki
Re: C* 2.1.2 invokes oom-killer
In all tables SSTable counts is below 30. On Thu, Feb 19, 2015 at 9:43 AM, Carlos Rolo r...@pythian.com wrote: Can you check how many SSTables you have? It is more or less a know fact that 2.1.2 has lots of problems with compaction so a upgrade can solve it. But a high number of SSTables can confirm that indeed compaction is your problem not something else. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Tel: 1649 www.pythian.com On Thu, Feb 19, 2015 at 9:16 AM, Michał Łowicki mlowi...@gmail.com wrote: We don't have other things running on these boxes and C* is consuming all the memory. Will try to upgrade to 2.1.3 and if won't help downgrade to 2.1.2. — Michał On Thu, Feb 19, 2015 at 2:39 AM, Jacob Rhoden jacob.rho...@me.com wrote: Are you tweaking the nice priority on Cassandra? (Type: man nice) if you don't know much about it. Certainly improving cassandra's nice score becomes important when you have other things running on the server like scheduled jobs of people logging in to the server and doing things. __ Sent from iPhone On 19 Feb 2015, at 5:28 am, Michał Łowicki mlowi...@gmail.com wrote: Hi, Couple of times a day 2 out of 4 members cluster nodes are killed root@db4:~# dmesg | grep -i oom [4811135.792657] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [6559049.307293] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Nodes are using 8GB heap (confirmed with *nodetool info*) and aren't using row cache. Noticed that couple of times a day used RSS is growing really fast within couple of minutes and I see CPU spikes at the same time - https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0 . Could be related to compaction but after compaction is finished used RSS doesn't shrink. Output from pmap when C* process uses 50GB RAM (out of 64GB) is available on http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb. At the time dump was made heap usage is far below 8GB (~3GB) but total RSS is ~50GB. Any help will be appreciated. -- BR, Michał Łowicki -- -- BR, Michał Łowicki
Re: C* 2.1.2 invokes oom-killer
On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com wrote: So compaction doesn't seem to be your problem (You can check with nodetool compactionstats just to be sure). pending tasks: 0 How much is your write latency on your column families? I had OOM related to this before, and there was a tipping point around 70ms. Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter. -- -- BR, Michał Łowicki
C* 2.1.2 invokes oom-killer
Hi, Couple of times a day 2 out of 4 members cluster nodes are killed root@db4:~# dmesg | grep -i oom [4811135.792657] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [6559049.307293] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Nodes are using 8GB heap (confirmed with *nodetool info*) and aren't using row cache. Noticed that couple of times a day used RSS is growing really fast within couple of minutes and I see CPU spikes at the same time - https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0 . Could be related to compaction but after compaction is finished used RSS doesn't shrink. Output from pmap when C* process uses 50GB RAM (out of 64GB) is available on http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb. At the time dump was made heap usage is far below 8GB (~3GB) but total RSS is ~50GB. Any help will be appreciated. -- BR, Michał Łowicki
Re: Timeouts but returned consistency level is invalid
Thanks Philip. This explains why I see ALL. Any idea why sometimes ONE is returned? — Michał On Fri, Jan 30, 2015 at 4:18 PM, Philip Thompson philip.thomp...@datastax.com wrote: Jan is incorrect. Keyspaces do not have consistency levels set on them. Consistency Levels are always set by the client. You are almost certainly running into https://issues.apache.org/jira/browse/CASSANDRA-7947 which is fixed in 2.1.3 and 2.0.12. On Fri, Jan 30, 2015 at 8:37 AM, Michał Łowicki mlowi...@gmail.com wrote: Hi Jan, I'm using only one keyspace. Even if it defaults to ONE why sometimes ALL is returned? On Fri, Jan 30, 2015 at 2:28 PM, Jan cne...@yahoo.com wrote: HI Michal; The consistency level defaults to ONE for all write and read operations. However consistency level is also set for the keyspace. Could it be possible that your queries are spanning multiple keyspaces which bear different levels of consistency ? cheers Jan C* Architect On Friday, January 30, 2015 1:36 AM, Michał Łowicki mlowi...@gmail.com wrote: Hi, We're using C* 2.1.2, django-cassandra-engine which in turn uses cqlengine. LOCAL_QUROUM is set as default consistency level. From time to time we get timeouts while talking to the database but what is strange returned consistency level is not LOCAL_QUROUM: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 3 responses. info={'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'} code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'} code=1100 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 0 responses. info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} Any idea why it might happen? -- BR, Michał Łowicki -- BR, Michał Łowicki
Timeouts but returned consistency level is invalid
Hi, We're using C* 2.1.2, django-cassandra-engine which in turn uses cqlengine. LOCAL_QUROUM is set as default consistency level. From time to time we get timeouts while talking to the database but what is strange returned consistency level is not LOCAL_QUROUM: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 3 responses. info={'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'} code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'} code=1100 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 0 responses. info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} Any idea why it might happen? -- BR, Michał Łowicki
Re: Timeouts but returned consistency level is invalid
Hi Jan, I'm using only one keyspace. Even if it defaults to ONE why sometimes ALL is returned? On Fri, Jan 30, 2015 at 2:28 PM, Jan cne...@yahoo.com wrote: HI Michal; The consistency level defaults to ONE for all write and read operations. However consistency level is also set for the keyspace. Could it be possible that your queries are spanning multiple keyspaces which bear different levels of consistency ? cheers Jan C* Architect On Friday, January 30, 2015 1:36 AM, Michał Łowicki mlowi...@gmail.com wrote: Hi, We're using C* 2.1.2, django-cassandra-engine which in turn uses cqlengine. LOCAL_QUROUM is set as default consistency level. From time to time we get timeouts while talking to the database but what is strange returned consistency level is not LOCAL_QUROUM: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 3 responses. info={'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'} code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'} code=1100 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 0 responses. info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} Any idea why it might happen? -- BR, Michał Łowicki -- BR, Michał Łowicki
Inconsistencies between two tables if BATCH used
Hi, We've two tables in: * First one *entity *has log-like structure - whenever entity is modified we create new version of it and put into the table with new mtime which is part of compound key. Old one is removed. * Second one called *entity_by_id *is manually managed index for *entity*. By having only id you can get basic entity attributes from *entity_by_id*. While adding entity we do two inserts - to *entity *and *entity_by_id *(in this order) While deleting we do the same using the same order so first we remove record from *entity *table. It turned out that these two tables were inconsistent. We had ~260 records in *entity_by_id *for which there is no corresponding record in *entity. *In *entity *table it's much worse because ~7000 records in *entity_by_id* are missing and it was growing much faster. We were using LOCAL_QUROUM. C* 2.1.2. Two datacenters. We didn't get any exceptions while inserts or deletes. BatchQuery from cqlengine (0.20.0) has been used. If BatchQuery is not used: with BatchQuery() as b: -entity.batch(b).save() -entity_by_id = EntityById.copy_fields_from(entity) -entity_by_id.batch(b).save() +entity.save() +entity_by_id = EntityById.copy_fields_from(entity) +entity_by_id.save() Everything is fine. We don't have more inconsistencies. I've check what cqlengine generates and seems that works as expected: ('BEGIN BATCH\n UPDATE sync.entity SET name = %(4)s WHERE user_id = %(0)s AND data_type_id = %(1)s AND version = %(2)s AND id = %(3)s\n INSERT INTO sync.entity_by_id (user_id, id, parent_id, deleted, folder, data_type_id, version) VALUES (%(5)s, %(6)s, %(7)s, %(8)s, %(9)s, %(10)s, %(11)s)\nAPPLY BATCH;',) We suspect that it's a problem in the C* itself. Any ideas how to debug what is going on as BATCH is needed in this case? -- BR, Michał Łowicki
Re: Number of SSTables grows after repair
@Robert could you point me to some of those issues? I would be very graceful for some explanation why this is semi-expected. On Fri, Jan 2, 2015 at 8:01 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Dec 15, 2014 at 1:51 AM, Michał Łowicki mlowi...@gmail.com wrote: We've noticed that number of SSTables grows radically after running *repair*. What we did today is to compact everything so for each node number of SStables 10. After repair it jumped to ~1600 on each node. What is interesting is that size of many is very small. The smallest ones are ~60 bytes in size (http://paste.ofcode.org/6yyH2X52emPNrKdw3WXW3d) This is semi-expected if using vnodes. There are various tickets open to address aspects of this issue. Table information - http://paste.ofcode.org/32RijfxQkNeb9cx9GAAnM45 We're using Cassandra 2.1.2. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ =Rob -- BR, Michał Łowicki
Number of SSTables grows after repair
Hi, We've noticed that number of SSTables grows radically after running *repair*. What we did today is to compact everything so for each node number of SStables 10. After repair it jumped to ~1600 on each node. What is interesting is that size of many is very small. The smallest ones are ~60 bytes in size (http://paste.ofcode.org/6yyH2X52emPNrKdw3WXW3d) Table information - http://paste.ofcode.org/32RijfxQkNeb9cx9GAAnM45 We're using Cassandra 2.1.2. -- BR, Michał Łowicki