Re: Garbage collector launched on all nodes at once
It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549 On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com wrote: Looks that memtable heap size is growing on some nodes rapidly ( https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0). Drops are the places when nodes have been restarted. On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki -- BR, Michał Łowicki
Re: Garbage collector launched on all nodes at once
Looks that memtable heap size is growing on some nodes rapidly ( https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0). Drops are the places when nodes have been restarted. On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki -- BR, Michał Łowicki
Re: Garbage collector launched on all nodes at once
okay, iirc memtable has been removed off heap, google and got this http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 apparently, there are still some reference on heap. On Thu, Jun 18, 2015 at 1:11 PM, Marcus Eriksson krum...@gmail.com wrote: It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549 On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com wrote: Looks that memtable heap size is growing on some nodes rapidly ( https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0). Drops are the places when nodes have been restarted. On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki -- BR, Michał Łowicki
Minor compaction not triggered
Hi I have a cassandra cluster of 6 nodes, with DateTiered compaction for the tables/CFs For some reason the minor compaction never happens. I have enabled debug logging and I don't see any debug logs related to compaction like the following https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L150 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java#L127 As a result of no compactions, now the cluster has more than 50K SStables per node. How do i debug this issue further? Appreciate any help..
Re: Connection reset during repair service
Hi David, Edouard, Depending on your data model on event_data, you might want to consider upgrading to use DTCS (C* 2.0.11+). Basically if those tombstones are due to a a Constant TTL and this is a time series, it could be a real improvement. See: https://labs.spotify.com/2014/12/18/date-tiered-compaction/ http://www.datastax.com/dev/blog/datetieredcompactionstrategy I am not sure this is related to your problem but having 8904 tombstones read at once is pretty bad. Also you might want to paginate queries a bit since it looks like you retrieve a lot of data at once. Meanwhile, if you are using STCS you can consider performing major compaction on a regular basis (taking into consideration major compaction downsides) C*heers, Alain 2015-06-12 15:08 GMT+02:00 David CHARBONNIER david.charbonn...@rgsystem.com : Hi, We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service. When Repair Service is running, we can see repair timing out on a few ranges in OPSCenter’s event log viewer. See screenshot attached. On our Cassandra nodes, we can see a lot of theese messages in cassandra/system.log log file while a timeout shows up in OPSCenter : ERROR [Native-Transport-Requests:3372] 2015-06-12 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) You’ll find attached an extract of the system.log file with some more informations. Do you have any idea of what’s happening ? We suspect timeouts happening because we have some tables with many tombstones, and a warning is something triggered. We have edited the configuration allow warning, but still perform until encounter 1.000.000 tombstones. During a compaction, we’ve also warning messages telling us that we’ve a lot of tombstones too : WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904 SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} Do you think it’s related to our first problem ? Our cluster is configured as follow : - 8 nodes with Debian 7.8 x64 - 16Gb of memory and 4 CPU - 2 HDD : 1 for the system and the other for the data directory Best regards, *David CHARBONNIER* Sysadmin T : +33 411 934 200 david.charbonn...@rgsystem.com ZAC Aéroport 125 Impasse Adam Smith 34470 Pérols - France *www.rgsystem.com* http://www.rgsystem.com/
Re: Using Cassandra and Twisted (Python)
Hello Alex, thanks for your answer! I'll try posting there as well then! Best, Jonathan On 06/16/2015 07:05 PM, Alex Popescu wrote: Jonathan, I'm pretty sure you'll have better chances to get this answered on the Python driver mailing list https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user On Tue, Jun 16, 2015 at 1:01 AM, Jonathan Ballet jbal...@gfproducts.ch mailto:jbal...@gfproducts.ch wrote: Hi, I'd like to write some Python applications using Twisted to talk to a Cassandra cluster. It seems like the Datastax Python library from https://github.com/datastax/python-driver does support Twisted, but it's not exactly clear how I would use this library along with Twisted. The documentation for the async API is very sparse and there's no mention on how to plug this into Twisted event-loop. Does anyone have a small working example on how to use both of these? Thanks! Jonathan -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax
spark-sql estimates Cassandra table with 3 rows as 8 TB of data, Cassandra 2.1, DSE 4.7
Hi, spark-sql estimated input for Cassandra table with 3 rows as 8 TB. sometimes it's estimated as -167B. I run it on laptop, I don't have 8 TB space for the data. We use DSE 4.7 with bundled spark and spark-sql-thriftserver Here is the stat for a dummy select foo from bar where bar three rows and several columns - *Total task time across all tasks: *7.6 min - *Input: *8388608.0 TB I don't have so much TB on my macbook pro. I would like to, but I dont :(
Re: Connection reset during repair service
Do you do a ton of random updates amd deletes? That would not be a good workload for DTCS. Where are all your tombstones coming from? On Jun 17, 2015 3:43 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi David, Edouard, Depending on your data model on event_data, you might want to consider upgrading to use DTCS (C* 2.0.11+). Basically if those tombstones are due to a a Constant TTL and this is a time series, it could be a real improvement. See: https://labs.spotify.com/2014/12/18/date-tiered-compaction/ http://www.datastax.com/dev/blog/datetieredcompactionstrategy I am not sure this is related to your problem but having 8904 tombstones read at once is pretty bad. Also you might want to paginate queries a bit since it looks like you retrieve a lot of data at once. Meanwhile, if you are using STCS you can consider performing major compaction on a regular basis (taking into consideration major compaction downsides) C*heers, Alain 2015-06-12 15:08 GMT+02:00 David CHARBONNIER david.charbonn...@rgsystem.com: Hi, We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service. When Repair Service is running, we can see repair timing out on a few ranges in OPSCenter’s event log viewer. See screenshot attached. On our Cassandra nodes, we can see a lot of theese messages in cassandra/system.log log file while a timeout shows up in OPSCenter : ERROR [Native-Transport-Requests:3372] 2015-06-12 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) You’ll find attached an extract of the system.log file with some more informations. Do you have any idea of what’s happening ? We suspect timeouts happening because we have some tables with many tombstones, and a warning is something triggered. We have edited the configuration allow warning, but still perform until encounter 1.000.000 tombstones. During a compaction, we’ve also warning messages telling us that we’ve a lot of tombstones too : WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904 SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} Do you think it’s related to our first problem ? Our cluster is configured as follow : - 8 nodes with Debian 7.8 x64 - 16Gb of memory and 4 CPU - 2 HDD : 1 for the system and the other for the data directory Best regards, *David CHARBONNIER* Sysadmin T : +33 411 934 200 david.charbonn...@rgsystem.com ZAC Aéroport 125 Impasse Adam Smith 34470 Pérols - France *www.rgsystem.com* http://www.rgsystem.com/
Re: Connection reset during repair service
Regarding the Datastax repair service I saw the same error over here. Here is the datastax answer fwiw: The repair service timeout message is telling you that the service has not received a response from the nodetool repair process running on Cassandra within the configured (default) 3600 seconds. When this happens, the Opscenter repair service stops monitoring the progress and places the sub range repair request to the back of a queue to be re-run at a later time. Is not necessarily indicative of a repair failure but it does suggest that the repair process is taking longer than expected for some reason, typically due to a hang, network issues, or wide rows on the table being repaired. As a possible workaround you can increase the timeout value in opscenter by increasing the timeout period in the opscenterd.conf or cluster_name.conf (cluster takes precedence ) but if there is an underlying issue with repairs completing on Cassandra this will not help. single_repair_timeout = 3600 (see: http://docs.datastax.com/en/opscenter/4.1/opsc/online_help/services/repairServiceAdvancedConfiguration.html ). 2015-06-17 15:21 GMT+02:00 Sebastian Estevez sebastian.este...@datastax.com : Do you do a ton of random updates amd deletes? That would not be a good workload for DTCS. Where are all your tombstones coming from? On Jun 17, 2015 3:43 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi David, Edouard, Depending on your data model on event_data, you might want to consider upgrading to use DTCS (C* 2.0.11+). Basically if those tombstones are due to a a Constant TTL and this is a time series, it could be a real improvement. See: https://labs.spotify.com/2014/12/18/date-tiered-compaction/ http://www.datastax.com/dev/blog/datetieredcompactionstrategy I am not sure this is related to your problem but having 8904 tombstones read at once is pretty bad. Also you might want to paginate queries a bit since it looks like you retrieve a lot of data at once. Meanwhile, if you are using STCS you can consider performing major compaction on a regular basis (taking into consideration major compaction downsides) C*heers, Alain 2015-06-12 15:08 GMT+02:00 David CHARBONNIER david.charbonn...@rgsystem.com: Hi, We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service. When Repair Service is running, we can see repair timing out on a few ranges in OPSCenter’s event log viewer. See screenshot attached. On our Cassandra nodes, we can see a lot of theese messages in cassandra/system.log log file while a timeout shows up in OPSCenter : ERROR [Native-Transport-Requests:3372] 2015-06-12 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) You’ll find attached an extract of the system.log file with some more informations. Do you have any idea of what’s happening ? We suspect timeouts happening because we have some tables with many tombstones, and a warning is something triggered. We have edited the configuration allow warning, but still perform until encounter 1.000.000 tombstones. During a compaction, we’ve also warning messages telling us that we’ve a lot of tombstones too : WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904 SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} Do you think it’s related to our first problem ? Our cluster is configured as follow : - 8 nodes with Debian 7.8 x64 - 16Gb of memory and 4 CPU - 2 HDD : 1 for the system and the other for the data directory Best regards, *David CHARBONNIER* Sysadmin T : +33 411 934 200
Garbage collector launched on all nodes at once
Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki