RE: How do you monitoring Cassandra Cluster?

2018-06-21 Thread Jonathan Baynes
I’m using Quest Foglight, very good if you have budget for it??

Jonathan Baynes
DBA
Tradeweb Europe Limited
Moor Place  •  1 Fore Street Avenue  •  London EC2Y 9DT
P +44 (0)20 77760988  •  F +44 (0)20 7776 3201  •  M +44 (0)7884111546
jonathan.bay...@tradeweb.com

[cid:image001.jpg@01CD26AD.4165F110]   follow us:  
[cid:image002.jpg@01CD26AD.4165F110] 

[cid:image003.jpg@01CD26AD.4165F110] 
—
A leading marketplace for electronic 
fixed income, derivatives and ETF trading



From: Rahul Singh [mailto:rahul.xavier.si...@gmail.com]
Sent: 21 June 2018 15:15
To: user@cassandra.apache.org; user@cassandra.apache.org
Subject: Re: How do you monitoring Cassandra Cluster?

I’ve collected a bunch at 
http://leaves.anant.us/#!/?tag=cassandra,monitoring

I reommend Grafana / Prometheus if you don’t have DSE (which has OpsCenter)


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On Jun 19, 2018, 1:06 PM -0400, Romain Gérard 
mailto:romain.ger...@erebe.eu>>, wrote:

Hi Felipe,

You can use this project 
https://github.com/criteo/cassandra_exporter
 if you are using Prometheus (Disclamer, I am one of the author of it).
There is included a Grafana dashboard that aggregate metrics per cluster for 
you, and in the "edit view" of each chart there are hidden queries that let you 
drill down by nodes.
In any case, you can look at the json in order to grasp how to aggregate 
queries.

Let me know if you need more information regarding the setup.

If you end-up using the project, give a star to the project it is always 
pleasant,
Regards,
Romain Gérard

On Jun 18 2018, at 3:25 pm, Felipe Esteves 
mailto:felipe.este...@b2wdigital.com>> wrote:

Hi, everyone,

I'm running some tests to monitor Cassandra 3.x with jmx_exporter + prometheus 
+ grafana.
I've managed to config it all and use the dashboard 
https://grafana.com/dashboards/5408

However, I still can't aggregate metrics from all my cluster, just nodes 
individually.
Any tips on how to do that?

Also, Opscenter gives some datacenter aggregation, I think it comes from 
nodetool as I didn't seen any metrics about that.
Anyone having success on that?

cheers!

Em qua, 28 de jun de 2017 às 19:43, Petrus Gomes 
mailto:petru...@gmail.com>> escreveu:
I'm using JMX+Prometheus and Grafana.
JMX = 
https://github.com/prometheus/jmx_exporter
Prometheus + Grafana = 
https://prometheus.io/docs/visualization/grafana/

There are some dashboard examples like that: 
https://grafana.com/dashboards/371
Looks good.

Thanks,
Petrus Silva

On Wed, Jun 28, 2017 at 5:55 AM, Peng Xiao 
<2535...@qq.com> wrote:
Dear All,

we are currently using Cassandra 2.1.13,and it has grown to 5TB size with 32 
nodes in one DC.
For monitoring,opsCenter does not  send alarm and not free in higher version.so 
we have to use a simple JMX+Zabbix template.And we plan to use 
Jolokia+JMX2Graphite to draw the metrics chart now.

Could you please advise?

Thanks,
Henry



Esta mensagem pode conter informações confidenciais e somente o indivíduo ou 
entidade a quem foi destinada pode utilizá-la. A transmissão incorreta da 
mensagem não 

Re: How do you monitoring Cassandra Cluster?

2018-06-21 Thread Rahul Singh
I’ve collected a bunch at http://leaves.anant.us/#!/?tag=cassandra,monitoring

I reommend Grafana / Prometheus if you don’t have DSE (which has OpsCenter)


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On Jun 19, 2018, 1:06 PM -0400, Romain Gérard , wrote:
> Hi Felipe,
>
> You can use this project https://github.com/criteo/cassandra_exporter if you 
> are using Prometheus (Disclamer, I am one of the author of it).
> There is included a Grafana dashboard that aggregate metrics per cluster for 
> you, and in the "edit view" of each chart there are hidden queries that let 
> you drill down by nodes.
> In any case, you can look at the json in order to grasp how to aggregate 
> queries.
>
> Let me know if you need more information regarding the setup.
>
> If you end-up using the project, give a star to the project it is always 
> pleasant,
> Regards,
> Romain Gérard
>
>
> On Jun 18 2018, at 3:25 pm, Felipe Esteves  
> wrote:
> >
> > Hi, everyone,
> >
> > I'm running some tests to monitor Cassandra 3.x with jmx_exporter + 
> > prometheus + grafana.
> > I've managed to config it all and use the dashboard 
> > https://grafana.com/dashboards/5408
> >
> > However, I still can't aggregate metrics from all my cluster, just nodes 
> > individually.
> > Any tips on how to do that?
> >
> > Also, Opscenter gives some datacenter aggregation, I think it comes from 
> > nodetool as I didn't seen any metrics about that.
> > Anyone having success on that?
> >
> > cheers!
> >
> > Em qua, 28 de jun de 2017 às 19:43, Petrus Gomes  
> > escreveu:
> > > I'm using JMX+Prometheus and Grafana.
> > > JMX = https://github.com/prometheus/jmx_exporter
> > > Prometheus + Grafana = https://prometheus.io/docs/visualization/grafana/
> > >
> > > There are some dashboard examples like that: 
> > > https://grafana.com/dashboards/371
> > > Looks good.
> > >
> > > Thanks,
> > > Petrus Silva
> > >
> > > On Wed, Jun 28, 2017 at 5:55 AM, Peng Xiao <2535...@qq.com> wrote:
> > > > Dear All,
> > > >
> > > > we are currently using Cassandra 2.1.13,and it has grown to 5TB size 
> > > > with 32 nodes in one DC.
> > > > For monitoring,opsCenter does not  send alarm and not free in higher 
> > > > version.so we have to use a simple JMX+Zabbix template.And we plan to 
> > > > use Jolokia+JMX2Graphite to draw the metrics chart now.
> > > >
> > > > Could you please advise?
> > > >
> > > > Thanks,
> > > > Henry
> > >
> > >
> > >
> > >
> > > Esta mensagem pode conter informações confidenciais e somente o indivíduo 
> > > ou entidade a quem foi destinada pode utilizá-la. A transmissão incorreta 
> > > da mensagem não acarreta a perda de sua confidencialidade. Caso esta 
> > > mensagem tenha sido recebida por engano, solicitamos que o fato seja 
> > > comunicado ao remetente e que a mensagem seja eliminada de seu sistema 
> > > imediatamente. É vedado a qualquer pessoa que não seja o destinatário 
> > > usar, revelar, distribuir ou copiar qualquer parte desta mensagem. 
> > > Ambiente de comunicação sujeito a monitoramento.
> > >
> > > This message may include confidential information and only the intended 
> > > addresses have the right to use it as is, or any part of it. A wrong 
> > > transmission does not break its confidentiality. If you've received it 
> > > because of a mistake or erroneous transmission, please notify the sender 
> > > and delete it from your system immediately. This communication 
> > > environment is controlled and monitored.
> > >
> > > B2W Digital
> > >
> > >
> > --
> > Felipe Esteves
> >
> > Tecnologia
> >
> > felipe.este...@b2wdigital.com
> >
> > Tel.: (21) 3504-7162 ramal 57162


RE: [EXTERNAL] Re: Tombstone

2018-06-21 Thread Rahul Singh
Queues can be implemented in Cassandra even though everyone believes its an 
“anti-pattern” if the design is designed for Cassandra’s model.

In this case, I would do a logical / soft delete on the data to invalidate it 
from a query that accesses it and put a TTL on the data so it deletes 
automatically later. You could have a default TTL or set a TTL on on your 
actual “delete” which would put the delete in the future for example 3 days 
from now.

Some sources of inspiration on how people have been doing queues on Cassandra

cherami by Uber
CMB by Comcast
cassieq — don’t remember.



--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On Jun 19, 2018, 12:39 PM -0400, Durity, Sean R , 
wrote:
> This sounds like a queue pattern, which is typically an anti-pattern for 
> Cassandra. I would say that it is very difficult to get the access patterns, 
> tombstones, and everything else lined up properly to solve a queue problem.
>
>
> Sean Durity
>
> From: Abhishek Singh 
> Sent: Tuesday, June 19, 2018 10:41 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: Tombstone
>
>                        The Partition key is made of datetime(basically date 
> truncated to hour) and bucket.I think your RCA may be correct since we are 
> deleting the partition rows one by one not in a batch files maybe overlapping 
> for the particular partition.A scheduled thread picks the rows for a 
> partition based on current datetime and bucket number and checks whether for 
> each row the entiry is past due or not, if yes we trigger a event and remove 
> the entry.
>
>
>
> On Tue 19 Jun, 2018, 7:58 PM Jeff Jirsa,  wrote:
> > The most likely explanation is tombstones in files that won’t be collected 
> > as they potentially overlap data in other files with a lower timestamp 
> > (especially true if your partition key doesn’t change and you’re writing 
> > and deleting data within a partition)
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Jun 19, 2018, at 3:28 AM, Abhishek Singh  wrote:
> > >
> > > Hi all,
> > >            We using Cassandra for storing events which are time series 
> > >based for batch processing once a particular batch based on hour is 
> > >processed we delete the entries but we were left with almost 18% deletes 
> > >marked as Tombstones.
> > >                  I ran compaction on the particular CF tombstone didn't 
> > >come down.
> > >             Can anyone suggest what is the optimal tunning/recommended 
> > >practice used for compaction strategy and GC_grace period with 100k 
> > >entries and deletes every hour.
> > >
> > > Warm Regards
> > > Abhishek Singh
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org


RE: how to avoid lightwieght transactions

2018-06-21 Thread Rahul Singh
A read before write is always going to be tremendously more than just writing. 
Depending on your architecture you may consider both of the options described.

If you have a CQRS architecture and are processing an event queue — doing LWT / 
read before write , then your “write” is processed asynchronously with YOUR 
command professor.

If you are directly doing interactions with Cassandra, and need extremely fast 
writes with no latency, I’d do append only method.

CQRS just separates the event processing from the reading — and when combined 
with asynchronous architecture in your application such as an event queue — 
basically mitigates / hedged performance loss in doing LWT.

You can always use CQRS without LWT.

Rahul
On Jun 21, 2018, 4:38 AM -0400, Jacques-Henri Berthemet 
, wrote:
> Hi,
>
> Another way would be to make your PK a clustering key with Id as PK and time 
> as clustering with type TimeUUID. Then you’ll always insert records, never 
> update, for each “transaction” you’ll keep a row in the partition. Then when 
> you’ll read all the rows for that partition by Id, you’ll process all of them 
> to know the real status. For example, if final status must be “completed” and 
> you have:
>
> Id, TimeUUI, status
> 1, t0, added
> 1, t1, added
> 1, t2, completed
> 1, t3, added
>
> When reading back you’ll just discard the last row.
>
>
> If you’re only concerned about “insert or update” case but the data is 
> actually the same you can always insert. If you insert on an existing record 
> it will just overwrite it, if you update without an existing record it will 
> insert data. In Cassandra there is not much difference between insert and 
> update operations.
>
> Regards,
> --
> Jacques-Henri Berthemet
>
> From: Rajesh Kishore [mailto:rajesh10si...@gmail.com]
> Sent: Thursday, June 21, 2018 7:45 AM
> To: user@cassandra.apache.org
> Subject: Re: how to avoid lightwieght transactions
>
> Hi,
>
> I think LWT feature is introduced for your kind of usecases only -  you don't 
> want other requests to be updating the same data at the same time using Paxos 
> algo(2 Phase commit).
> So, IMO your usecase makes perfect sense to use LWT to avoid concurrent 
> updates.
> If your issue is not the concurrent update one then IMHO you may want to 
> split this in two steps:
> - get the transcation_type with quorum factor (or higher consistency level)
> -  And conditionally update the row with with quorum factor (or higher 
> consistency level)
> But remember, this wont be atomic in nature and wont solve the concurrent 
> update issue if you have.
>
> Regards,
> Rajesh
>
>
>
> On Wed, Jun 20, 2018 at 2:59 AM, manuj singh  wrote:
> > quote_type
> > Hi all,
> > we have a use case where we need to update frequently our rows. Now in 
> > order to do so and so that we dont override updates we have to resort to 
> > lightweight transactions.
> > Since lightweight is expensive(could be 4 times as expensive as normal 
> > insert) , how do we model around it.
> >
> > e.g i have a table where
> >
> > CREATE TABLE multirow (
> >     id text,
> >     time text,
> >     transcation_type text,
> >     status text,
> >     PRIMARY KEY (id, time)
> > )
> >
> > So lets say we update status column multiple times. So first time we update 
> > we also have to make sure that the transaction exists otherwise normal 
> > update will insert it and then the original insert comes in and it will 
> > override the update.
> > So in order to fix that we need to use light weight transactions.
> >
> > Is there another way i can model this so that we can avoid the lightweight 
> > transactions.
> >
> >
> > Thanks
> >
>


RE: how to avoid lightwieght transactions

2018-06-21 Thread Jacques-Henri Berthemet
Hi,

Another way would be to make your PK a clustering key with Id as PK and time as 
clustering with type TimeUUID. Then you’ll always insert records, never update, 
for each “transaction” you’ll keep a row in the partition. Then when you’ll 
read all the rows for that partition by Id, you’ll process all of them to know 
the real status. For example, if final status must be “completed” and you have:

Id, TimeUUI, status
1, t0, added
1, t1, added
1, t2, completed
1, t3, added

When reading back you’ll just discard the last row.


If you’re only concerned about “insert or update” case but the data is actually 
the same you can always insert. If you insert on an existing record it will 
just overwrite it, if you update without an existing record it will insert 
data. In Cassandra there is not much difference between insert and update 
operations.

Regards,
--
Jacques-Henri Berthemet

From: Rajesh Kishore [mailto:rajesh10si...@gmail.com]
Sent: Thursday, June 21, 2018 7:45 AM
To: user@cassandra.apache.org
Subject: Re: how to avoid lightwieght transactions

Hi,

I think LWT feature is introduced for your kind of usecases only -  you don't 
want other requests to be updating the same data at the same time using Paxos 
algo(2 Phase commit).
So, IMO your usecase makes perfect sense to use LWT to avoid concurrent updates.
If your issue is not the concurrent update one then IMHO you may want to split 
this in two steps:
- get the transcation_type with quorum factor (or higher consistency level)
-  And conditionally update the row with with quorum factor (or higher 
consistency level)
But remember, this wont be atomic in nature and wont solve the concurrent 
update issue if you have.

Regards,
Rajesh



On Wed, Jun 20, 2018 at 2:59 AM, manuj singh 
mailto:s.manuj...@gmail.com>> wrote:
Hi all,
we have a use case where we need to update frequently our rows. Now in order to 
do so and so that we dont override updates we have to resort to lightweight 
transactions.
Since lightweight is expensive(could be 4 times as expensive as normal insert) 
, how do we model around it.

e.g i have a table where


CREATE TABLE multirow (

id text,

time text,

transcation_type text,

status text,

PRIMARY KEY (id, time)

)



So lets say we update status column multiple times. So first time we update we 
also have to make sure that the transaction exists otherwise normal update will 
insert it and then the original insert comes in and it will override the update.

So in order to fix that we need to use light weight transactions.



Is there another way i can model this so that we can avoid the lightweight 
transactions.





Thanks





Re: Network problems during repair make it hang on "Wait for validation to complete"

2018-06-21 Thread Dmitry Simonov
In the previous message, I have pasted source code from cassandra 2.2.8 by
mistake.
Re-checked for 2.2.11 source.
These lines are the same.

2018-06-21 2:49 GMT+05:00 Dmitry Simonov :

> Hello!
>
> Using Cassandra 2.2.11, I observe behaviour, that is very similar to
> https://issues.apache.org/jira/browse/CASSANDRA-12860
>
> Steps to reproduce:
> 1. Set up a cluster: ccm create five -v 2.2.11 && ccm populate -n 5
> --vnodes && ccm start
> 2. Import some keyspace into it (approx 50 Mb of data)
> 3. Start repair on one node: ccm node2 nodetool repair KEYSPACE
> 4. While repair is still running, disconnect node3: sudo iptables -I
> INPUT -p tcp -d 127.0.0.3 -j DROP
> 5. This repair hangs.
> 6. Restore network connectivity
> 7. Repair is still hanging.
> 8. Following repairs will also hang.
>
> In tpstats I see tasks that make no progress:
>
> $ for i in {1..5}; do echo node$i; ccm node$i nodetool tpstats | grep
> "Repair#"; done
> node1
> Repair#1  1  2255  1
> 0 0
> node2
> Repair#1  1  2335 26
> 0 0
> node3
> node4
> Repair#3  1   147   2175
> 0 0
> node5
> Repair#1  1  2335 17
> 0 0
>
> In jconsole I see that Repair threads are blocked here:
>
> Name: Repair#1:1
> State: WAITING on 
> com.google.common.util.concurrent.AbstractFuture$Sync@73c5ab7e
> Total blocked: 0  Total waited: 242
>
> Stack trace:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
> com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1371)
> org.apache.cassandra.repair.RepairJob.run(RepairJob.java:167)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>
>
> According to the source code, they are waiting for validations to complete:
>
> # 
> ./apache-cassandra-2.2.8-src/src/java/org/apache/cassandra/repair/RepairJob.java
>  74 public void run()
>  75 {
> ...
> 166 // Wait for validation to complete
> 167 Futures.getUnchecked(validations);
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-11824 says that problem
> was fixed in 2.2.7, but I use 2.2.11.
>
> Restart of all Cassandra nodes that have hanging tasks (one-by-one) allows
> these tasks to disappear from tpstats. After that repairs work well (until
> next network problem).
>
> I also suppose that long GC times on one node (as well as network issues)
> during repair may also lead to the same problem.
>
> Is it a known issue?
>
> --
> Best Regards,
> Dmitry Simonov
>



-- 
Best Regards,
Dmitry Simonov