Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Marcus Eriksson
It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549

On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Looks that memtable heap size is growing on some nodes rapidly (
 https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
 Drops are the places when nodes have been restarted.

 On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki



Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Looks that memtable heap size is growing on some nodes rapidly (
https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
Drops are the places when nodes have been restarted.

On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




-- 
BR,
Michał Łowicki


Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Jason Wee
okay, iirc memtable has been removed off heap, google and got this
http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
 apparently, there are still some reference on heap.

On Thu, Jun 18, 2015 at 1:11 PM, Marcus Eriksson krum...@gmail.com wrote:

 It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549

 On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Looks that memtable heap size is growing on some nodes rapidly (
 https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
 Drops are the places when nodes have been restarted.

 On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki





Minor compaction not triggered

2015-06-17 Thread Jayapandian Ponraj
Hi

I have a cassandra cluster of 6 nodes, with DateTiered compaction for
the tables/CFs
For some reason the minor compaction never happens.
I have enabled debug logging and I don't see any debug logs related to
compaction like the following

https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L150
https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java#L127

As a result of no compactions, now the cluster has more than 50K
SStables per node.
How do i debug this issue further?
Appreciate any help..


Re: Connection reset during repair service

2015-06-17 Thread Alain RODRIGUEZ
Hi David, Edouard,

Depending on your data model on event_data, you might want to consider
upgrading to use DTCS (C* 2.0.11+).

Basically if those tombstones are due to a a Constant TTL and this is a
time series, it could be a real improvement.

See:
https://labs.spotify.com/2014/12/18/date-tiered-compaction/
http://www.datastax.com/dev/blog/datetieredcompactionstrategy

I am not sure this is related to your problem but having 8904 tombstones
read at once is pretty bad. Also you might want to paginate queries a bit
since it looks like you retrieve a lot of data at once.

Meanwhile, if you are using STCS you can consider performing major
compaction on a regular basis (taking into consideration major compaction
downsides)

C*heers,

Alain





2015-06-12 15:08 GMT+02:00 David CHARBONNIER david.charbonn...@rgsystem.com
:

  Hi,



 We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and we’re
 experiencing issues with OPSCenter (version 5.1.3) Repair Service.

 When Repair Service is running, we can see repair timing out on a few
 ranges in OPSCenter’s event log viewer. See screenshot attached.



 On our Cassandra nodes, we can see a lot of theese messages in
 cassandra/system.log log file while a timeout shows up in OPSCenter :



 ERROR [Native-Transport-Requests:3372] 2015-06-12
 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
 request

 java.io.IOException: Connection reset by peer

 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

 at sun.nio.ch.SocketDispatcher.read(Unknown Source)

 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)

 at sun.nio.ch.IOUtil.read(Unknown Source)

 at sun.nio.ch.SocketChannelImpl.read(Unknown Source)

 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)

 at
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

 at java.lang.Thread.run(Unknown Source)



 You’ll find attached an extract of the system.log file with some more
 informations.



 Do you have any idea of what’s happening ?



 We suspect timeouts happening because we have some tables with many
 tombstones, and a warning is something triggered. We have edited the
 configuration allow warning, but still perform until encounter 1.000.000
 tombstones.



 During a compaction, we’ve also warning messages telling us that we’ve a
 lot of tombstones too :



 WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
 SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
 in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was
 requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
 localDeletion=2147483647}



 Do you think it’s related to our first problem ?



 Our cluster is configured as follow :

 -  8 nodes with Debian 7.8 x64

 -  16Gb of memory and 4 CPU

 -  2  HDD : 1 for the system and the other for the data directory



 Best regards,



 *David CHARBONNIER*

 Sysadmin

 T : +33 411 934 200

 david.charbonn...@rgsystem.com

 ZAC Aéroport

 125 Impasse Adam Smith

 34470 Pérols - France

 *www.rgsystem.com* http://www.rgsystem.com/









Re: Using Cassandra and Twisted (Python)

2015-06-17 Thread Jonathan Ballet

Hello Alex,

thanks for your answer! I'll try posting there as well then!

Best,

 Jonathan


On 06/16/2015 07:05 PM, Alex Popescu wrote:

Jonathan,

I'm pretty sure you'll have better chances to get this answered on the
Python driver mailing list
https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user

On Tue, Jun 16, 2015 at 1:01 AM, Jonathan Ballet jbal...@gfproducts.ch
mailto:jbal...@gfproducts.ch wrote:

Hi,

I'd like to write some Python applications using Twisted to talk to
a Cassandra cluster.

It seems like the Datastax Python library from
https://github.com/datastax/python-driver does support Twisted, but
it's not exactly clear how I would use this library along with
Twisted. The documentation for the async API is very sparse and
there's no mention on how to plug this into Twisted event-loop.

Does anyone have a small working example on how to use both of these?

Thanks!

Jonathan




--
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax



spark-sql estimates Cassandra table with 3 rows as 8 TB of data, Cassandra 2.1, DSE 4.7

2015-06-17 Thread Serega Sheypak
Hi, spark-sql estimated input for Cassandra table with 3 rows as 8 TB.
sometimes it's estimated as -167B.
I run it on laptop, I don't have 8 TB space for the data.

We use DSE 4.7 with bundled spark and spark-sql-thriftserver

Here is the stat for a dummy select foo from bar where bar three rows and
several columns


   - *Total task time across all tasks: *7.6 min
   - *Input: *8388608.0 TB

I don't have so much TB on my macbook pro. I would like to, but I dont :(


Re: Connection reset during repair service

2015-06-17 Thread Sebastian Estevez
Do you do a ton of random updates amd deletes? That would not be a good
workload for DTCS.

Where are all your tombstones coming from?
 On Jun 17, 2015 3:43 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi David, Edouard,

 Depending on your data model on event_data, you might want to consider
 upgrading to use DTCS (C* 2.0.11+).

 Basically if those tombstones are due to a a Constant TTL and this is a
 time series, it could be a real improvement.

 See:
 https://labs.spotify.com/2014/12/18/date-tiered-compaction/
 http://www.datastax.com/dev/blog/datetieredcompactionstrategy

 I am not sure this is related to your problem but having 8904 tombstones
 read at once is pretty bad. Also you might want to paginate queries a bit
 since it looks like you retrieve a lot of data at once.

 Meanwhile, if you are using STCS you can consider performing major
 compaction on a regular basis (taking into consideration major compaction
 downsides)

 C*heers,

 Alain





 2015-06-12 15:08 GMT+02:00 David CHARBONNIER 
 david.charbonn...@rgsystem.com:

  Hi,



 We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and
 we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service.

 When Repair Service is running, we can see repair timing out on a few
 ranges in OPSCenter’s event log viewer. See screenshot attached.



 On our Cassandra nodes, we can see a lot of theese messages in
 cassandra/system.log log file while a timeout shows up in OPSCenter :



 ERROR [Native-Transport-Requests:3372] 2015-06-12
 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
 request

 java.io.IOException: Connection reset by peer

 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

 at sun.nio.ch.SocketDispatcher.read(Unknown Source)

 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)

 at sun.nio.ch.IOUtil.read(Unknown Source)

 at sun.nio.ch.SocketChannelImpl.read(Unknown Source)

 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)

 at
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

 at java.lang.Thread.run(Unknown Source)



 You’ll find attached an extract of the system.log file with some more
 informations.



 Do you have any idea of what’s happening ?



 We suspect timeouts happening because we have some tables with many
 tombstones, and a warning is something triggered. We have edited the
 configuration allow warning, but still perform until encounter 1.000.000
 tombstones.



 During a compaction, we’ve also warning messages telling us that we’ve a
 lot of tombstones too :



 WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
 SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
 in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was
 requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
 localDeletion=2147483647}



 Do you think it’s related to our first problem ?



 Our cluster is configured as follow :

 -  8 nodes with Debian 7.8 x64

 -  16Gb of memory and 4 CPU

 -  2  HDD : 1 for the system and the other for the data directory



 Best regards,



 *David CHARBONNIER*

 Sysadmin

 T : +33 411 934 200

 david.charbonn...@rgsystem.com

 ZAC Aéroport

 125 Impasse Adam Smith

 34470 Pérols - France

 *www.rgsystem.com* http://www.rgsystem.com/











Re: Connection reset during repair service

2015-06-17 Thread Alain RODRIGUEZ
Regarding the Datastax repair service I saw the same error over here.

Here is the datastax answer fwiw:

The repair service timeout message is telling you that the service has not
received a response from the nodetool repair process running on Cassandra
within the configured (default) 3600 seconds. When this happens, the
Opscenter repair service stops monitoring the progress and places the sub
range repair request to the back of a queue to be re-run at a later time.
Is not necessarily indicative of a repair failure but it does suggest that
the repair process is taking longer than expected for some reason,
typically due to a hang, network issues, or wide rows on the table being
repaired.

As a possible workaround you can increase the timeout value in opscenter by
increasing the timeout period in the opscenterd.conf or cluster_name.conf
(cluster takes precedence ) but if there is an underlying issue with
repairs completing on Cassandra this will not help.

single_repair_timeout = 3600

(see:
http://docs.datastax.com/en/opscenter/4.1/opsc/online_help/services/repairServiceAdvancedConfiguration.html
).




2015-06-17 15:21 GMT+02:00 Sebastian Estevez sebastian.este...@datastax.com
:

 Do you do a ton of random updates amd deletes? That would not be a good
 workload for DTCS.

 Where are all your tombstones coming from?
  On Jun 17, 2015 3:43 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi David, Edouard,

 Depending on your data model on event_data, you might want to consider
 upgrading to use DTCS (C* 2.0.11+).

 Basically if those tombstones are due to a a Constant TTL and this is a
 time series, it could be a real improvement.

 See:
 https://labs.spotify.com/2014/12/18/date-tiered-compaction/
 http://www.datastax.com/dev/blog/datetieredcompactionstrategy

 I am not sure this is related to your problem but having 8904 tombstones
 read at once is pretty bad. Also you might want to paginate queries a bit
 since it looks like you retrieve a lot of data at once.

 Meanwhile, if you are using STCS you can consider performing major
 compaction on a regular basis (taking into consideration major compaction
 downsides)

 C*heers,

 Alain





 2015-06-12 15:08 GMT+02:00 David CHARBONNIER 
 david.charbonn...@rgsystem.com:

  Hi,



 We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and
 we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service.

 When Repair Service is running, we can see repair timing out on a few
 ranges in OPSCenter’s event log viewer. See screenshot attached.



 On our Cassandra nodes, we can see a lot of theese messages in
 cassandra/system.log log file while a timeout shows up in OPSCenter :



 ERROR [Native-Transport-Requests:3372] 2015-06-12
 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
 request

 java.io.IOException: Connection reset by peer

 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

 at sun.nio.ch.SocketDispatcher.read(Unknown Source)

 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)

 at sun.nio.ch.IOUtil.read(Unknown Source)

 at sun.nio.ch.SocketChannelImpl.read(Unknown Source)

 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)

 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)

 at
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

 at java.lang.Thread.run(Unknown Source)



 You’ll find attached an extract of the system.log file with some more
 informations.



 Do you have any idea of what’s happening ?



 We suspect timeouts happening because we have some tables with many
 tombstones, and a warning is something triggered. We have edited the
 configuration allow warning, but still perform until encounter 1.000.000
 tombstones.



 During a compaction, we’ve also warning messages telling us that we’ve a
 lot of tombstones too :



 WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
 SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
 in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was
 requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
 localDeletion=2147483647}



 Do you think it’s related to our first problem ?



 Our cluster is configured as follow :

 -  8 nodes with Debian 7.8 x64

 -  16Gb of memory and 4 CPU

 -  2  HDD : 1 for the system and the other for the data
 directory



 Best regards,



 *David CHARBONNIER*

 Sysadmin

 T : +33 411 934 200

 

Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Hi,

Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is
launched at the same time on each node (See [1] for total GC duration per 5
seconds). RF is set to 3. Any ideas?

[1]
https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

-- 
BR,
Michał Łowicki