Re: Open source equivalents of OpsCenter

2016-07-14 Thread Michał Łowicki
My experience while looking for a replacement on
https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063
<https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063#.icv7eukko>
<https://medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063#.icv7eukko>
On Thursday, 14 July 2016, Stefano Ortolani <ostef...@gmail.com> wrote:

> Replaced OpsCenter with a mix of:
>
> * metrics-graphite-3.1.0.jar installed in the same classpath of C*
> * Custom script to push system metrics (cpu/mem/io)
> * Grafana to create the dashboard
> * Custom repairs script
>
> Still not optimal but getting there...
>
> Stefano
>
> On Thu, Jul 14, 2016 at 10:18 AM, Romain Hardouin <romainh...@yahoo.fr
> <javascript:_e(%7B%7D,'cvml','romainh...@yahoo.fr');>> wrote:
>
>> Hi Juho,
>>
>> Out of curiosity, which stack did you use to make your dashboard?
>>
>> Romain
>>
>> Le Jeudi 14 juillet 2016 10h43, Juho Mäkinen <juho.maki...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','juho.maki...@gmail.com');>> a écrit :
>>
>>
>> I'm doing some work on replacing OpsCenter in out setup. I ended creating
>> a Docker container which contains the following features:
>>
>>  - Cassandra 2.2.7
>>  - MX4J (a JMX to REST bridge) as a java-agent
>>  - metrics-graphite-3.1.0.jar (export some but not all JMX to graphite)
>>  - a custom ruby which uses MX4J to export some JMX metrics to graphite
>> which we don't otherwise get.
>>
>> With this I will get all our cassandra instances and their JMX exposed
>> data to graphite, which allows us to use Grafana and Graphite to draw
>> pretty dashboards.
>>
>> In addition I started writing some code which currently provides the
>> following features:
>>  - A dashboard which provides a similar ring view what OpsCenter does,
>> with onMouseOver features to display more info on each node.
>>  - Simple HTTP GET/POST based api to do
>> - Setup a new non-vnode based cluster
>> - Get a JSON blob on cluster information, all its tokens, machines
>> and so on
>> - Api for new cluster instances so that they can get a token slot
>> from the ring when they boot.
>> - Option to kill a dead node and mark its slot for replace, so the
>> new booting node can use cassandra.replace_address option.
>>
>> The node is not yet packaged in any way for distribution and some parts
>> depend on our Chef installation, but if there's interest I can publish at
>> least some parts from it.
>>
>>  - Garo
>>
>> On Thu, Jul 14, 2016 at 10:54 AM, Romain Hardouin <romainh...@yahoo.fr
>> <javascript:_e(%7B%7D,'cvml','romainh...@yahoo.fr');>> wrote:
>>
>> Do you run C* on physical machine or in the cloud? If the topology
>> doesn't change too often you can have a look a Zabbix. The downside is that
>> you have to set up all the JMX metrics yourself... but that's also a good
>> point because you can have custom metrics. If you want nice
>> graphs/dashboards you can use Grafana to plot Zabbix data. (We're also
>> using SaaS but that's not open source).
>> For the rolling restart and other admin stuff we're using Rundeck. It's a
>> great tool when working in a team.
>>
>> (I think it's time to implement an open source alternative to OpsCenter.
>> If some guys are interested I'm in.)
>>
>> Best,
>>
>> Romain
>>
>>
>>
>>
>> Le Jeudi 14 juillet 2016 0h01, Ranjib Dey <dey.ran...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','dey.ran...@gmail.com');>> a écrit :
>>
>>
>> we use datadog (metrics emitted as raw statsd) for the dashboard. All
>> repair & compaction is done via blender & serf[1].
>> [1]https://github.com/pagerduty/blender
>>
>>
>> On Wed, Jul 13, 2016 at 2:42 PM, Kevin O'Connor <ke...@reddit.com
>> <javascript:_e(%7B%7D,'cvml','ke...@reddit.com');>> wrote:
>>
>> Now that OpsCenter doesn't work with open source installs, are there any
>> runs at an open source equivalent? I'd be more interested in looking at
>> metrics of a running cluster and doing other tasks like managing
>> repairs/rolling restarts more so than historical data.
>>
>>
>>
>>
>>
>>
>>
>>
>

-- 
BR,
Michał Łowicki


Re: Cassandra monitoring

2016-06-14 Thread Michał Łowicki
My team ended up with Diamond / StatsD / Graphite / Grafana (more
background in
medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063).
We're relying on such stack heavily in other projects and our infra in
general.

On Tue, Jun 14, 2016 at 10:29 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> OpsCenter going forward is limited to datastax enterprise versions.
>
> I know a lot of people like DataDog, but I haven't used it.  Maybe other
> people on the list can speak from recent first hand experience on it's pros
> and cons.
>
>
> On Tue, Jun 14, 2016 at 1:20 PM Arun Ramakrishnan <
> sinchronized.a...@gmail.com> wrote:
>
>> Thanks Jonathan.
>>
>> Out of curiosity, does opscenter support some later version of cassandra
>> that is not OSS ?
>>
>> Well, the most minimal requirement is that, I want to be able to monitor
>> for cluster health and hook this info to some alerting platform. We are AWS
>> heavy. We just really heavily on AWS cloud watch for our metrics as of now.
>> We prefer to not spend our time setting up additional tools if we can help
>> it. So, if we needed a 3rd party service we would consider an APM or
>> monitoring service that is on the cheaper side.
>>
>>
>>
>>
>> On Tue, Jun 14, 2016 at 12:20 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> Depends what you want to monitor.  I wouldn't use a lesser version of
>>> Cassandra for OpsCenter, it doesn't give you a ton you can't get elsewhere
>>> and it's not ever going to support OSS > 2.1, so you kind of limit yourself
>>> to a pretty old version of Cassandra for a non-good reason.
>>>
>>> What else do you use for monitoring in your infra?  I've used a mix of
>>> OSS tools (nagios, statsd, graphite, ELK), and hosted solutions. The nice
>>> part about them is that you can monitor your whole stack in a single UI not
>>> just your database.
>>>
>>> On Tue, Jun 14, 2016 at 12:10 PM Arun Ramakrishnan <
>>> sinchronized.a...@gmail.com> wrote:
>>>
>>>> What are the options for a very small and nimble startup to do keep a
>>>> cassandra cluster running well oiled. We are on AWS. We are interested in a
>>>> monitoring tool and potentially also cluster management tools.
>>>>
>>>> We are currently on apache cassandra 3.7. We were hoping the datastax
>>>> opscenter would be it (It is free for startups our size). But, looks like
>>>> it does not support cassandra versions greater than v2.1. It is pretty
>>>> surprising considering cassandra v2.1  came out in 2014.
>>>>
>>>> We would consider downgrading to datastax cassandra 2.1 just to have
>>>> robust monitoring tools. But, I am not sure if having opscenter offsets all
>>>> the improvements that have been added to cassandra since 2.1.
>>>>
>>>> Sematext has a integrations for monitoring cassandra. Does anyone have
>>>> good experience with it ?
>>>>
>>>> How much work would be involved to setup Ganglia or some such option
>>>> for cassandra ?
>>>>
>>>> Thanks,
>>>> Arun
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>


-- 
BR,
Michał Łowicki


Re: Replacing disks

2016-02-29 Thread Michał Łowicki
On Mon, Feb 29, 2016 at 8:52 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> I wrote that a few days ago:
> http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html
>
> I believe this might help you.
>

Yes, looks promising. Thanks!


> C*heers,
> ---
>
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http <http://www.thelastpickle.com/>:// <http://www.thelastpickle.com/>
> www.thelastpickle.com
> Le 28 févr. 2016 15:17, "Clint Martin" <
> clintlmar...@coolfiretechnologies.com> a écrit :
>
>> Code wise, I am not completely familiar with what accomplishes the
>> behavior.  But my understanding and experience is that Cass 2.1 picks the
>> drive with the most free space when picking a destination for a compaction
>> operation.
>> (This is an overly simplistic description. Reality is always more
>> nuanced.  datastax had a blog post that describes this better as well as
>> limitations to the algorithm in 2.1 which are addressed in the 3.x releases
>> )
>>
>> Clint
>> On Feb 28, 2016 10:11 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Feb 28, 2016 at 4:00 PM, Clint Martin <
>>> clintlmar...@coolfiretechnologies.com> wrote:
>>>
>>>> Your plan for replacing your 200gb drive sounds good to me. Since you
>>>> are running jbod, I wouldn't worry about manually redistributing data from
>>>> your other disk to the new one. Cassandra will do that for you as it
>>>> performs compaction.
>>>>
>>>
>>> Is this done by pickWriteableDirectory
>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L386>
>>> ?
>>>
>>>> While you're doing the drive change, you need to complete the swap and
>>>> restart of the node before the hinted handoff window expires on the other
>>>> nodes. If you do not complete in time, you'll want to perform a repair on
>>>> the node.
>>>>
>>>
>>> Yes. Thanks!
>>>
>>>
>>>>
>>>>
>>>> Clint
>>>> On Feb 28, 2016 9:33 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've two disks on single box (500GB + 200GB). data_file_directories
>>>>> in cassandra.yaml has two entries. I would like to replace 200GB with 
>>>>> 500GB
>>>>> as it's running out of space and to align it with others we've in the
>>>>> cluster. The plan is to stop C*, attach new disk, move data from 200GB to
>>>>> new one and mount it at the same point in the hierarchy. When done start 
>>>>> C*.
>>>>>
>>>>> Additionally I would like to move some data from the old 500GB to the
>>>>> new one to distribute used disk space equally. Probably all related files
>>>>> for single SSTable should be moved i.e.
>>>>>
>>>>> foo-bar-ka-1630184-CompressionInfo.db
>>>>>
>>>>> foo-bar-ka-1630184-Data.db
>>>>>
>>>>> foo-bar-ka-1630184-Digest.sha1
>>>>>
>>>>> foo-bar-ka-1630184-Filter.db
>>>>>
>>>>> foo-bar-ka-1630184-Index.db
>>>>>
>>>>> foo-bar-ka-1630184-Statistics.db
>>>>>
>>>>> foo-bar-ka-1630184-Summary.db
>>>>>
>>>>> foo-bar-ka-1630184-TOC.txt
>>>>>
>>>>> Is this something which should work or you see some obstacles? (C*
>>>>> 2.1.13).
>>>>> --
>>>>> BR,
>>>>> Michał Łowicki
>>>>>
>>>>
>>>
>>>
>>> --
>>> BR,
>>> Michał Łowicki
>>>
>>


-- 
BR,
Michał Łowicki


Re: Replacing disks

2016-02-28 Thread Michał Łowicki
On Sun, Feb 28, 2016 at 4:00 PM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:

> Your plan for replacing your 200gb drive sounds good to me. Since you are
> running jbod, I wouldn't worry about manually redistributing data from your
> other disk to the new one. Cassandra will do that for you as it performs
> compaction.
>

Is this done by pickWriteableDirectory
<https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L386>
?

> While you're doing the drive change, you need to complete the swap and
> restart of the node before the hinted handoff window expires on the other
> nodes. If you do not complete in time, you'll want to perform a repair on
> the node.
>

Yes. Thanks!


>
>
> Clint
> On Feb 28, 2016 9:33 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote:
>
>> Hi,
>>
>> I've two disks on single box (500GB + 200GB). data_file_directories in
>> cassandra.yaml has two entries. I would like to replace 200GB with 500GB as
>> it's running out of space and to align it with others we've in the cluster.
>> The plan is to stop C*, attach new disk, move data from 200GB to new one
>> and mount it at the same point in the hierarchy. When done start C*.
>>
>> Additionally I would like to move some data from the old 500GB to the new
>> one to distribute used disk space equally. Probably all related files for
>> single SSTable should be moved i.e.
>>
>> foo-bar-ka-1630184-CompressionInfo.db
>>
>> foo-bar-ka-1630184-Data.db
>>
>> foo-bar-ka-1630184-Digest.sha1
>>
>> foo-bar-ka-1630184-Filter.db
>>
>> foo-bar-ka-1630184-Index.db
>>
>> foo-bar-ka-1630184-Statistics.db
>>
>> foo-bar-ka-1630184-Summary.db
>>
>> foo-bar-ka-1630184-TOC.txt
>>
>> Is this something which should work or you see some obstacles? (C*
>> 2.1.13).
>> --
>> BR,
>> Michał Łowicki
>>
>


-- 
BR,
Michał Łowicki


Replacing disks

2016-02-28 Thread Michał Łowicki
Hi,

I've two disks on single box (500GB + 200GB). data_file_directories in
cassandra.yaml has two entries. I would like to replace 200GB with 500GB as
it's running out of space and to align it with others we've in the cluster.
The plan is to stop C*, attach new disk, move data from 200GB to new one
and mount it at the same point in the hierarchy. When done start C*.

Additionally I would like to move some data from the old 500GB to the new
one to distribute used disk space equally. Probably all related files for
single SSTable should be moved i.e.

foo-bar-ka-1630184-CompressionInfo.db

foo-bar-ka-1630184-Data.db

foo-bar-ka-1630184-Digest.sha1

foo-bar-ka-1630184-Filter.db

foo-bar-ka-1630184-Index.db

foo-bar-ka-1630184-Statistics.db

foo-bar-ka-1630184-Summary.db

foo-bar-ka-1630184-TOC.txt

Is this something which should work or you see some obstacles? (C* 2.1.13).
-- 
BR,
Michał Łowicki


Re: Increase compaction performance

2016-02-12 Thread Michał Łowicki
I had to decrease streaming throughput to 10 (from default 200) in order to
avoid effect or rising number of SSTables and number of compaction tasks
while running repair. It's working very slow but it's stable and doesn't
hurt the whole cluster. Will try to adjust configuration gradually to see
if can make it any better. Thanks!

On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki <mlowi...@gmail.com> wrote:

>
>
> On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>> Also, are you using incremental repairs (not sure about the available
>> options in Spotify Reaper) what command did you run ?
>>
>>
> No.
>
>
>> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>:
>>
>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>>>
>>>
>>>
>>> What is your current compaction throughput ?  The current value of
>>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>>>
>>
>
> Throughput was initially set to 1024 and I've gradually increased it to
> 2048, 4K and 16K but haven't seen any changes. Tried to change it both from
> `nodetool` and also cassandra.yaml (with restart after changes).
>
>
>>
>>> nodetool getcompactionthroughput
>>>
>>> How to speed up compaction? Increased compaction throughput and
>>>> concurrent compactors but no change. Seems there is plenty idle
>>>> resources but can't force C* to use it.
>>>>
>>>
>>> You might want to try un-throttle the compaction throughput through:
>>>
>>> nodetool setcompactionsthroughput 0
>>>
>>> Choose a canari node. Monitor compaction pending and disk throughput
>>> (make sure server is ok too - CPU...)
>>>
>>
>
> Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit
> sceptical about it.
>
>
>>
>>> Some other information could be useful:
>>>
>>> What is your number of cores per machine and the compaction strategies
>>> for the 'most compacting' tables. What are write/update patterns, any TTL
>>> or tombstones ? Do you use a high number of vnodes ?
>>>
>>
> I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
> 256.
>
> Using LCS for all tables. Write / update heavy. No warnings about large
> number of tombstones but we're removing items frequently.
>
>
>
>>
>>> Also what is your repair routine and your values for gc_grace_seconds ?
>>> When was your last repair and do you think your cluster is suffering of a
>>> high entropy ?
>>>
>>
> We're having problem with repair for months (CASSANDRA-9935).
> gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
> successfully for long time I guess cluster is suffering of high entropy.
>
>
>>
>>> You can lower the stream throughput to make sure nodes can cope with
>>> what repairs are feeding them.
>>>
>>> nodetool getstreamthroughput
>>> nodetool setstreamthroughput X
>>>
>>
> Yes, this sounds interesting. As we're having problem with repair for
> months it could that lots of things are transferred between nodes.
>
> Thanks!
>
>
>>
>>> C*heers,
>>>
>>> -
>>> Alain Rodriguez
>>> France
>>>
>>> The Last Pickle
>>> http://www.thelastpickle.com
>>>
>>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <mlowi...@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
>>>> using Cassandra Reaper but nodes after couple of hours are full of pending
>>>> compaction tasks (regular not the ones about validation)
>>>>
>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
>>>>
>>>> How to speed up compaction? Increased compaction throughput and
>>>> concurrent compactors but no change. Seems there is plenty idle
>>>> resources but can't force C* to use it.
>>>>
>>>> Any clue where there might be a bottleneck?
>>>>
>>>>
>>>> --
>>>> BR,
>>>> Michał Łowicki
>>>>
>>>>
>>>
>>
>
>
> --
> BR,
> Michał Łowicki
>



-- 
BR,
Michał Łowicki


Re: Increase compaction performance

2016-02-11 Thread Michał Łowicki
On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Also, are you using incremental repairs (not sure about the available
> options in Spotify Reaper) what command did you run ?
>
>
No.


> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>:
>
>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>>
>>
>>
>> What is your current compaction throughput ?  The current value of
>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>>
>

Throughput was initially set to 1024 and I've gradually increased it to
2048, 4K and 16K but haven't seen any changes. Tried to change it both from
`nodetool` and also cassandra.yaml (with restart after changes).


>
>> nodetool getcompactionthroughput
>>
>> How to speed up compaction? Increased compaction throughput and
>>> concurrent compactors but no change. Seems there is plenty idle
>>> resources but can't force C* to use it.
>>>
>>
>> You might want to try un-throttle the compaction throughput through:
>>
>> nodetool setcompactionsthroughput 0
>>
>> Choose a canari node. Monitor compaction pending and disk throughput
>> (make sure server is ok too - CPU...)
>>
>

Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit
sceptical about it.


>
>> Some other information could be useful:
>>
>> What is your number of cores per machine and the compaction strategies
>> for the 'most compacting' tables. What are write/update patterns, any TTL
>> or tombstones ? Do you use a high number of vnodes ?
>>
>
I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
256.

Using LCS for all tables. Write / update heavy. No warnings about large
number of tombstones but we're removing items frequently.



>
>> Also what is your repair routine and your values for gc_grace_seconds ?
>> When was your last repair and do you think your cluster is suffering of a
>> high entropy ?
>>
>
We're having problem with repair for months (CASSANDRA-9935).
gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
successfully for long time I guess cluster is suffering of high entropy.


>
>> You can lower the stream throughput to make sure nodes can cope with what
>> repairs are feeding them.
>>
>> nodetool getstreamthroughput
>> nodetool setstreamthroughput X
>>
>
Yes, this sounds interesting. As we're having problem with repair for
months it could that lots of things are transferred between nodes.

Thanks!


>
>> C*heers,
>>
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <mlowi...@gmail.com>:
>>
>>> Hi,
>>>
>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
>>> using Cassandra Reaper but nodes after couple of hours are full of pending
>>> compaction tasks (regular not the ones about validation)
>>>
>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
>>>
>>> How to speed up compaction? Increased compaction throughput and
>>> concurrent compactors but no change. Seems there is plenty idle
>>> resources but can't force C* to use it.
>>>
>>> Any clue where there might be a bottleneck?
>>>
>>>
>>> --
>>> BR,
>>> Michał Łowicki
>>>
>>>
>>
>


-- 
BR,
Michał Łowicki


Increase compaction performance

2016-02-11 Thread Michał Łowicki
Hi,

Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair using
Cassandra Reaper but nodes after couple of hours are full of pending
compaction tasks (regular not the ones about validation)

CPU load is fine, SSD disks below 30% utilization, no long GC pauses.

How to speed up compaction? Increased compaction throughput and concurrent
compactors but no change. Seems there is plenty idle resources but can't
force C* to use it.

Any clue where there might be a bottleneck?


-- 
BR,
Michał Łowicki


Much less connected native clients after node join

2015-11-15 Thread Michał Łowicki
Hi,

I'm using Python Driver 2.7.2 connected to C* 2.1.11 cluster in two DCs. I
had to reboot and rejoin one node and noticed that after successful join
the number of connected native clients was much less than to other nodes
(blue line on the attached graph). It didn't fixed after many hours so I
restarted newly joined node ~9:50 and everything looked much better. I
guess expected behaviour would be to have same number connected clients
after some time.



​

-- 
BR,
Michał Łowicki


compaction became super slow after interrupted repair

2015-09-26 Thread Michał Łowicki
Hi,

Running C* 2.1.8 cluster in two data centers with 6 nodes each. I've
started running repair sequentially on each node (`nodetool repair
--parallel --in-local-dc`).

While running repair number of SSTables grows radically as well as pending
compaction tasks. It's fine as node usually recovers within couple of hours
after finishing repair (
https://www.dropbox.com/s/xzcndf5596mq7rm/Screenshot%202015-09-26%2016.17.44.png?dl=0).
One experiment showed that increasing compaction throughput and number of
compactors mitigates this problem.

Unfortunately one node didn't recovered... (
https://www.dropbox.com/s/nphnsaf2rbfm0bq/Screenshot%202015-09-26%2016.20.56.png?dl=0).
I needed to interrupt repair as node was running out of disk space. I hoped
that within couple of hours node will catch up with compaction but it
didn't happen even after 5 days.

I've tried to increase throughput, disable throttling, increasing number of
compactors, disabling binary / thrift / gossip, increasing heap size,
restarting but still compaction is super slow.

Tried today to run scrub:

root@db2:~# nodetool scrub sync

Aborted scrubbing atleast one column family in keyspace sync, check server
logs for more information.

error: nodetool failed, check server logs

-- StackTrace --

java.lang.RuntimeException: nodetool failed, check server logs

at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)

at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)

as well as cleanup:

root@db2:~# nodetool cleanup

Aborted cleaning up atleast one column family in keyspace sync, check
server logs for more information.

error: nodetool failed, check server logs

-- StackTrace --

java.lang.RuntimeException: nodetool failed, check server logs

at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)

at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)

Couldn't find anything in logs regarding these runtime exceptions (see log
here - https://www.dropbox.com/s/flmii7fgpyp07q2/db2.lati.system.log?dl=0).

Note that I'm experiencing CASSANDRA-9935 while running repair on each node
from the cluster.

Any help will be much appreciated.

-- 
BR,
Michał Łowicki


Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Looks that memtable heap size is growing on some nodes rapidly (
https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
Drops are the places when nodes have been restarted.

On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
 is launched at the same time on each node (See [1] for total GC duration
 per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki




-- 
BR,
Michał Łowicki


Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Hi,

Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is
launched at the same time on each node (See [1] for total GC duration per 5
seconds). RF is set to 3. Any ideas?

[1]
https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

-- 
BR,
Michał Łowicki


Re: How to interpret some GC logs

2015-06-02 Thread Michał Łowicki
On Tue, Jun 2, 2015 at 9:06 AM, Sebastian Martinka 
sebastian.marti...@mercateo.com wrote:

  this should help you:

 https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs


I don't see there such format. Passed options related to GC are:

-XX:+PrintGCDateStamps -Xloggc:/var/log/cassandra/gc.log




 Best Regards,
 Sebastian Martinka



 *Von:* Michał Łowicki [mailto:mlowi...@gmail.com]
 *Gesendet:* Montag, 1. Juni 2015 11:47
 *An:* user@cassandra.apache.org
 *Betreff:* How to interpret some GC logs



 Hi,



 Normally I get logs like:



 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K),
 0.0494560 secs]



 which is fine and understandable but occasionalIy I see something like:

 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600
 secs]

 How to interpret it? Does it miss only part before - so memory occupied
 before GC cycle?

 --

 BR,
 Michał Łowicki




-- 
BR,
Michał Łowicki


Re: How to interpret some GC logs

2015-06-02 Thread Michał Łowicki
On Mon, Jun 1, 2015 at 7:25 PM, Jason Wee peich...@gmail.com wrote:

 can you tell what jvm is that?


root@db2:~# java -version

java version 1.7.0_80

Java(TM) SE Runtime Environment (build 1.7.0_80-b15)

Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)



 jason

 On Mon, Jun 1, 2015 at 5:46 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Hi,

 Normally I get logs like:

 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K),
 0.0494560 secs]

 which is fine and understandable but occasionalIy I see something like:

 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600
 secs]

 How to interpret it? Does it miss only part before - so memory
 occupied before GC cycle?
 --
 BR,
 Michał Łowicki





-- 
BR,
Michał Łowicki


How to interpret some GC logs

2015-06-01 Thread Michał Łowicki
Hi,

Normally I get logs like:

2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K),
0.0494560 secs]

which is fine and understandable but occasionalIy I see something like:

2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600
secs]

How to interpret it? Does it miss only part before - so memory occupied
before GC cycle?
-- 
BR,
Michał Łowicki


Compaction freezes

2015-05-10 Thread Michał Łowicki
Hi,

Using C* 2.1.5 and table with leveled compaction I've found that number of
pending tasks is around 100. Turned out that one table cannot be compacted
and it always stops at the same point (more or less):

Compaction   sync   entity_by_id 4.91 GB   12.34 GB   bytes
39.79%

`service cassandra restart`

Compaction   sync   entity_by_id 4.91 GB   12.35 GB   bytes
39.74%

`service cassandra restart`

Compaction   sync   entity_by_id  4.9 GB   12.33 GB   bytes
39.77%

`service cassandra restart`

Compaction   sync   entity_by_id 4.89 GB   12.32 GB   bytes
39.73%

After doubling heap size (cassandra-env.sh) compaction went fine but I'm
considering this change as a temporary solution and after a while
compaction started to freeze again anyway.

Is there any way to get insight into compaction so to get answer why it
freezes and basically what compactor is doing currently? I've enabled DEBUG
logging but it's too verbose as node is getting some traffic. Can I enable
DEBUG for compaction only?
-- 
BR,
Michał Łowicki


Re: C* 2.1.2 invokes oom-killer

2015-02-23 Thread Michał Łowicki
After couple of days it's still behaving fine. Case closed.

On Thu, Feb 19, 2015 at 11:15 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Upgrade to 2.1.3 seems to help so far. After ~12 hours total memory
 consumption grew from 10GB to 10.5GB.

 On Thu, Feb 19, 2015 at 2:02 PM, Carlos Rolo r...@pythian.com wrote:

 Then you are probably hitting a bug... Trying to find out in Jira. The
 bad news is the fix is only to be released on 2.1.4. Once I find it out I
 will post it here.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 12:16 PM, Michał Łowicki mlowi...@gmail.com
 wrote:

 |trickle_fsync| has been enabled for long time in our settings (just
 noticed):

 trickle_fsync: true

 trickle_fsync_interval_in_kb: 10240

 On Thu, Feb 19, 2015 at 12:12 PM, Michał Łowicki mlowi...@gmail.com
 wrote:



 On Thu, Feb 19, 2015 at 11:02 AM, Carlos Rolo r...@pythian.com wrote:

 Do you have trickle_fsync enabled? Try to enable that and see if it
 solves your problem, since you are getting out of non-heap memory.

 Another question, is always the same nodes that die? Or is 2 out of 4
 that die?


 Always the same nodes. Upgraded to 2.1.3 two hours ago so we'll monitor
 if maybe issue has been fixed there. If not will try to enable
 |tricke_fsync|



 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: 
 *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 10:49 AM, Michał Łowicki mlowi...@gmail.com
 wrote:



 On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com
 wrote:

 So compaction doesn't seem to be your problem (You can check with
 nodetool compactionstats just to be sure).


 pending tasks: 0



 How much is your write latency on your column families? I had OOM
 related to this before, and there was a tipping point around 70ms.


 Write request latency is below 0.05 ms/op (avg). Checked with
 OpsCenter.



 --






 --
 BR,
 Michał Łowicki



 --






 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki



 --






 --
 BR,
 Michał Łowicki




-- 
BR,
Michał Łowicki


Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
In all tables SSTable counts is below 30.

On Thu, Feb 19, 2015 at 9:43 AM, Carlos Rolo r...@pythian.com wrote:

 Can you check how many SSTables you have? It is more or less a know fact
 that 2.1.2 has lots of problems with compaction so a upgrade can solve it.
 But a high number of SSTables can confirm that indeed compaction is your
 problem not something else.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 9:16 AM, Michał Łowicki mlowi...@gmail.com
 wrote:

 We don't have other things running on these boxes and C* is consuming all
 the memory.

 Will try to upgrade to 2.1.3 and if won't help downgrade to 2.1.2.

 —
 Michał


 On Thu, Feb 19, 2015 at 2:39 AM, Jacob Rhoden jacob.rho...@me.com
 wrote:

 Are you tweaking the nice priority on Cassandra? (Type: man nice) if
 you don't know much about it. Certainly improving cassandra's nice score
 becomes important when you have other things running on the server like
 scheduled jobs of people logging in to the server and doing things.

 __
 Sent from iPhone

 On 19 Feb 2015, at 5:28 am, Michał Łowicki mlowi...@gmail.com wrote:

  Hi,

 Couple of times a day 2 out of 4 members cluster nodes are killed

 root@db4:~# dmesg | grep -i oom
 [4811135.792657] [ pid ]   uid  tgid total_vm  rss cpu oom_adj
 oom_score_adj name
 [6559049.307293] java invoked oom-killer: gfp_mask=0x201da, order=0,
 oom_adj=0, oom_score_adj=0

 Nodes are using 8GB heap (confirmed with *nodetool info*) and aren't
 using row cache.

 Noticed that couple of times a day used RSS is growing really fast
 within couple of minutes and I see CPU spikes at the same time -
 https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0
 .

 Could be related to compaction but after compaction is finished used RSS
 doesn't shrink. Output from pmap when C* process uses 50GB RAM (out of
 64GB) is available on http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb.
 At the time dump was made heap usage is far below 8GB (~3GB) but total RSS
 is ~50GB.

 Any help will be appreciated.

 --
 BR,
 Michał Łowicki




 --






-- 
BR,
Michał Łowicki


Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com wrote:

 So compaction doesn't seem to be your problem (You can check with nodetool
 compactionstats just to be sure).


pending tasks: 0



 How much is your write latency on your column families? I had OOM related
 to this before, and there was a tipping point around 70ms.


Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter.



 --






-- 
BR,
Michał Łowicki


C* 2.1.2 invokes oom-killer

2015-02-18 Thread Michał Łowicki
Hi,

Couple of times a day 2 out of 4 members cluster nodes are killed

root@db4:~# dmesg | grep -i oom
[4811135.792657] [ pid ]   uid  tgid total_vm  rss cpu oom_adj
oom_score_adj name
[6559049.307293] java invoked oom-killer: gfp_mask=0x201da, order=0,
oom_adj=0, oom_score_adj=0

Nodes are using 8GB heap (confirmed with *nodetool info*) and aren't using
row cache.

Noticed that couple of times a day used RSS is growing really fast within
couple of minutes and I see CPU spikes at the same time -
https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0
.

Could be related to compaction but after compaction is finished used RSS
doesn't shrink. Output from pmap when C* process uses 50GB RAM (out of
64GB) is available on http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb. At
the time dump was made heap usage is far below 8GB (~3GB) but total RSS is
~50GB.

Any help will be appreciated.

-- 
BR,
Michał Łowicki


Re: Timeouts but returned consistency level is invalid

2015-01-30 Thread Michał Łowicki
Thanks Philip. This explains why I see ALL. Any idea why sometimes ONE is 
returned?

—
Michał

On Fri, Jan 30, 2015 at 4:18 PM, Philip Thompson
philip.thomp...@datastax.com wrote:

 Jan is incorrect. Keyspaces do not have consistency levels set on them.
 Consistency Levels are always set by the client. You are almost certainly
 running into https://issues.apache.org/jira/browse/CASSANDRA-7947 which is
 fixed in 2.1.3 and 2.0.12.
 On Fri, Jan 30, 2015 at 8:37 AM, Michał Łowicki mlowi...@gmail.com wrote:
 Hi Jan,

 I'm using only one keyspace. Even if it defaults to ONE why sometimes ALL
 is returned?

 On Fri, Jan 30, 2015 at 2:28 PM, Jan cne...@yahoo.com wrote:

 HI Michal;

 The consistency level defaults to ONE for all write and read operations.
 However consistency level is also set for the keyspace.

 Could it be possible that your queries are spanning multiple keyspaces
 which bear different levels of consistency ?

 cheers
 Jan

 C* Architect


   On Friday, January 30, 2015 1:36 AM, Michał Łowicki mlowi...@gmail.com
 wrote:


 Hi,

 We're using C* 2.1.2, django-cassandra-engine which in turn uses
 cqlengine. LOCAL_QUROUM is set as default consistency level. From time to
 time we get timeouts while talking to the database but what is strange
 returned consistency level is not LOCAL_QUROUM:

 code=1200 [Coordinator node timed out waiting for replica nodes' responses] 
 message=Operation timed out - received only 3 responses. 
 info={'received_responses': 3, 'required_responses': 4, 'consistency': 
 'ALL'}


 code=1200 [Coordinator node timed out waiting for replica nodes' responses] 
 message=Operation timed out - received only 1 responses. 
 info={'received_responses': 1, 'required_responses': 2, 'consistency': 
 'LOCAL_QUORUM'}


 code=1100 [Coordinator node timed out waiting for replica nodes' responses] 
 message=Operation timed out - received only 0 responses. 
 info={'received_responses': 0, 'required_responses': 1, 'consistency': 
 'ONE'}


 Any idea why it might happen?

 --
 BR,
 Michał Łowicki





 --
 BR,
 Michał Łowicki


Timeouts but returned consistency level is invalid

2015-01-30 Thread Michał Łowicki
Hi,

We're using C* 2.1.2, django-cassandra-engine which in turn uses cqlengine.
LOCAL_QUROUM is set as default consistency level. From time to time we get
timeouts while talking to the database but what is strange returned
consistency level is not LOCAL_QUROUM:

code=1200 [Coordinator node timed out waiting for replica nodes'
responses] message=Operation timed out - received only 3 responses.
info={'received_responses': 3, 'required_responses': 4, 'consistency':
'ALL'}


code=1200 [Coordinator node timed out waiting for replica nodes'
responses] message=Operation timed out - received only 1 responses.
info={'received_responses': 1, 'required_responses': 2, 'consistency':
'LOCAL_QUORUM'}


code=1100 [Coordinator node timed out waiting for replica nodes'
responses] message=Operation timed out - received only 0 responses.
info={'received_responses': 0, 'required_responses': 1, 'consistency':
'ONE'}


Any idea why it might happen?

-- 
BR,
Michał Łowicki


Re: Timeouts but returned consistency level is invalid

2015-01-30 Thread Michał Łowicki
Hi Jan,

I'm using only one keyspace. Even if it defaults to ONE why sometimes ALL
is returned?

On Fri, Jan 30, 2015 at 2:28 PM, Jan cne...@yahoo.com wrote:

 HI Michal;

 The consistency level defaults to ONE for all write and read operations.
 However consistency level is also set for the keyspace.

 Could it be possible that your queries are spanning multiple keyspaces
 which bear different levels of consistency ?

 cheers
 Jan

 C* Architect


   On Friday, January 30, 2015 1:36 AM, Michał Łowicki mlowi...@gmail.com
 wrote:


 Hi,

 We're using C* 2.1.2, django-cassandra-engine which in turn uses
 cqlengine. LOCAL_QUROUM is set as default consistency level. From time to
 time we get timeouts while talking to the database but what is strange
 returned consistency level is not LOCAL_QUROUM:

 code=1200 [Coordinator node timed out waiting for replica nodes' responses] 
 message=Operation timed out - received only 3 responses. 
 info={'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'}


 code=1200 [Coordinator node timed out waiting for replica nodes' responses] 
 message=Operation timed out - received only 1 responses. 
 info={'received_responses': 1, 'required_responses': 2, 'consistency': 
 'LOCAL_QUORUM'}


 code=1100 [Coordinator node timed out waiting for replica nodes' responses] 
 message=Operation timed out - received only 0 responses. 
 info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}


 Any idea why it might happen?

 --
 BR,
 Michał Łowicki





-- 
BR,
Michał Łowicki


Inconsistencies between two tables if BATCH used

2015-01-15 Thread Michał Łowicki
Hi,

We've two tables in:
* First one *entity *has log-like structure - whenever entity is modified
we create new version of it and put into the table with new mtime which is
part of compound key. Old one is removed.
* Second one called *entity_by_id *is manually managed index for *entity*.
By having only id you can get basic entity attributes from *entity_by_id*.

While adding entity we do two inserts - to *entity *and *entity_by_id *(in
this order)
While deleting we do the same using the same order so first we remove
record from *entity *table.

It turned out that these two tables were inconsistent. We had ~260 records
in *entity_by_id *for which there is no corresponding record in *entity. *In
*entity *table it's much worse because ~7000 records in *entity_by_id* are
missing and it was growing much faster.

We were using LOCAL_QUROUM. C* 2.1.2. Two datacenters. We didn't get any
exceptions while inserts or deletes. BatchQuery from cqlengine (0.20.0) has
been used.

If BatchQuery is not used:

with BatchQuery() as b:
-entity.batch(b).save()
-entity_by_id = EntityById.copy_fields_from(entity)
-entity_by_id.batch(b).save()
+entity.save()
+entity_by_id = EntityById.copy_fields_from(entity)
+entity_by_id.save()


Everything is fine. We don't have more inconsistencies. I've check what
cqlengine generates and seems that works as expected:

('BEGIN  BATCH\n  UPDATE sync.entity SET name = %(4)s WHERE user_id =
%(0)s AND data_type_id = %(1)s AND version = %(2)s AND id = %(3)s\n
INSERT INTO sync.entity_by_id (user_id, id, parent_id, deleted,
folder, data_type_id, version) VALUES (%(5)s, %(6)s, %(7)s, %(8)s,
%(9)s, %(10)s, %(11)s)\nAPPLY BATCH;',)
We suspect that it's a problem in the C* itself. Any ideas how to debug
what is going on as BATCH is needed in this case?

-- 
BR,
Michał Łowicki


Re: Number of SSTables grows after repair

2015-01-05 Thread Michał Łowicki
@Robert could you point me to some of those issues?

I would be very graceful for some explanation why this is semi-expected.

On Fri, Jan 2, 2015 at 8:01 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Dec 15, 2014 at 1:51 AM, Michał Łowicki mlowi...@gmail.com
 wrote:

 We've noticed that number of SSTables grows radically after running
 *repair*. What we did today is to compact everything so for each node
 number of SStables  10. After repair it jumped to ~1600 on each node. What
 is interesting is that size of many is very small. The smallest ones are
 ~60 bytes in size (http://paste.ofcode.org/6yyH2X52emPNrKdw3WXW3d)


 This is semi-expected if using vnodes. There are various tickets open to
 address aspects of this issue.


 Table information - http://paste.ofcode.org/32RijfxQkNeb9cx9GAAnM45
 We're using Cassandra 2.1.2.


 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

 =Rob




-- 
BR,
Michał Łowicki


Number of SSTables grows after repair

2014-12-15 Thread Michał Łowicki
Hi,

We've noticed that number of SSTables grows radically after running
*repair*. What we did today is to compact everything so for each node
number of SStables  10. After repair it jumped to ~1600 on each node. What
is interesting is that size of many is very small. The smallest ones are
~60 bytes in size (http://paste.ofcode.org/6yyH2X52emPNrKdw3WXW3d)

Table information - http://paste.ofcode.org/32RijfxQkNeb9cx9GAAnM45
We're using Cassandra 2.1.2.

-- 
BR,
Michał Łowicki