Re: time tracking for down node for nodetool repair

2019-04-09 Thread Kunal
Thanks everyone for your valuable suggestion.  Really appreciate it


Regards,
Kunal Vaid

On Mon, Apr 8, 2019 at 7:41 PM Nitan Kainth  wrote:

> Valid suggestion. Stick to the plan, avoid downtime of a node more than
> hinted handoff window. OR increase window to a larger value, if you know it
> is going to take longer than current setting
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Apr 8, 2019, at 8:43 PM, Soumya Jena 
> wrote:
>
> Cassandra tracks it and no new hints will be created once the default 3
> hours window is passed  . However , cassandra will not automatically
> trigger a repair if your node is down for more than 3 hours .Default
> settings of 3 hours for hints is defined in cassandra.yaml file . Look for
> "max_hint_window_in_ms" in the cassandra.yaml file. Its configurable .
> Apart from the periodic repair you should start a repair when you bring up
> a node which has missed some writes .
>
> One more thing is  if node is down for long time and missed a lot of
> writes sometimes it may be better to add that as a new fresh node rather
> than adding it and then doing repair .
>
> On Mon, Apr 8, 2019 at 4:49 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
>> Ah I see it is the default for hinted handoffs. I was somehow thinking
>> its bigger figure I do not know why :)
>>
>> I would say you should run repairs continuously / periodically so you
>> would not even have to do some thinking about that and it should run
>> in the background in a scheduled manner if possible.
>>
>> Regards
>>
>> On Tue, 9 Apr 2019 at 04:19, Kunal  wrote:
>> >
>> > Hello everyone..
>> >
>> >
>> >
>> > I have a 6 node Cassandra datacenter, 3 nodes on each datacenter. If
>> one of the node goes down and remain down for more than 3 hr, I have to run
>> nodetool repair. Just wanted to ask if Cassandra  automatically tracks the
>> time when one of the Cassandra node goes down or do I need to write code to
>> track the time and run repair when node comes back online after 3 hrs.
>> >
>> >
>> > Thanks in anticipation.
>> >
>> > Regards,
>> > Kunal Vaid
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>> --



Regards,
Kunal Vaid


Re: Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Stefan Miklosovic
>> I have a 3 node cassandra cluster with Replication factor as 2 and 
>> read-write consistency set to QUORUM.

I am not sure what you want to achieve with this. If you have three
nodes and RF 2, for each write there will be two replicas, right ...
If one of your replicas is down out of two in total, you will never
reach quorum as one node is down and one is up and that is not quorum
if half of your nodes is up. If one of your nodes fails and the record
is on that one and some other, your query fails too so your cluster is
not protected against any failed nodes.

On Tue, 9 Apr 2019 at 23:10, Mahesh Daksha  wrote:
>
> Hello All,
>
> I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
> consistency set to QUORUM. We are using Spring data cassandra. All 
> infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then we 
> try to modify/update one of the record using save method of repo, like below:
>
> ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);
>
> After execution of above statement we never see any exception or error. But 
> still this update state goes silent/fail intermittently. That is at times the 
> record in the db gets updated successfully where as other time it fails. Also 
> in the above query when we print tmpRec it contains the updated and correct 
> value every time. Still in the db these updated values doesn't get reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the 
> our queries are getting logged there and are being executed also with out any 
> error or exception.
>
> Now another weird observation is this all thing works erfectly fine if I am 
> using single cassandra node (in kubernetes) or if we deploy above infra using 
> ansible (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment of 
> cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
>
>
> I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
> consistency set to QUORUM. We are using Spring data cassandra. All 
> infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then we 
> try to modify/update one of the record using save method of repo, like below:
>
> ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);
>
> After execution of above statement we never see any exception or error. But 
> still this update fail intermittently. That is when we check the record in 
> the db sometime it gets updated successfully where as other time it fails. 
> Also in the above query when we print tmpRec it contains the updated and 
> correct value. Still in the db these updated values doesnt get reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the 
> our queries are getting logged there and are being executed also.
>
> Now another weird observation is this all thing works if I am using single 
> cassandra node (in kubernetes) or if we deploy above infra using ansible 
> (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment of 
> cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
> Below are the contents of  my cassandra Docker file:
>
> FROM ubuntu:16.04
>
> RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils 
> net-tools && apt-get clean && \
> addgroup testuser && useradd -g testuser testuser && usermod --password 
> testuser testuser;
>
> RUN mkdir -p /opt/test && \
> mkdir -p /opt/test/data;
>
> ADD jre8.tar.gz /opt/test/
> ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/
>
> RUN chmod 755 -R /opt/test/jre && \
> ln -s /opt/test/jre/bin/java /usr/bin/java && \
> mv /opt/test/apache-cassandra* /opt/test/cassandra;
>
> RUN mkdir -p /opt/test/cassandra/logs;
>
> ENV JAVA_HOME /opt/test/jre
> RUN export JAVA_HOME
>
> COPY version.txt /opt/test/cassandra/version.txt
>
> WORKDIR /opt/test/cassandra/bin/
>
> RUN mkdir -p /opt/test/data/saved_caches && \
> mkdir -p /opt/test/data/commitlog && \
> mkdir -p /opt/test/data/hints && \
> chown -R testuser:testuser /opt/test/data && \
> chown -R testuser:testuser /opt/test;
>
> USER testuser
>
> CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e 
> 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' 
> ../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && 
> ./cassandra -f
>
> Please note conf.yml is basically cassandra.yml file having properties 
> related to cassandra.
>
>
> Thanks,
>
> Mahesh Daksha

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



All time blocked in nodetool tpstats

2019-04-09 Thread Abdul Patel
Hi,

My nodetool tpstats arw showing all time blocked high numbers a d also read
dropped messages as 400 .
Client is expeirince high timeouts.
Checked few online forums they recommend to increase,
native_transport_max_threads.
As of jow its commented with 128 ..
Is it adviabke to increase this and also can this fix timeout issue?


Re: Questions about C* performance related to tombstone

2019-04-09 Thread Jon Haddad
Normal deletes are fine.

Sadly there's a lot of hand wringing about tombstones in the generic
sense which leads people to try to work around *every* case where
they're used.  This is unnecessary.  A tombstone over a single row
isn't a problem, especially if you're only fetching that one row back.
Tombstones can be quite terrible under a few conditions:

1. When a range tombstone shadows hundreds / thousands / millions of
rows.  This wasn't even detectable prior to Cassandra 3 unless you
were either looking for it specifically or were doing CPU profiling:
http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
2. When rows were frequently created then deleted, and scanned over.
This is the queue pattern that we detest so much.
3. When they'd be created as a side effect from over writing
collections.  This is an accident typically.

The 'active' flag is good if you want to be able to go back and look
at old deleted assignments.  If you don't care about that, use a
normal delete.

Jon

On Tue, Apr 9, 2019 at 7:00 AM Li, George  wrote:
>
> Hi,
>
> I have a table defined like this:
>
> CREATE TABLE myTable (
> course_id text,
> assignment_id text,
> assignment_item_id text,
> data text,
> boolean active,
> PRIMARY KEY (course_id, assignment_id, assignment_item_id)
> );
> i.e. course_id as the partition key and assignment_id, assignment_item_id as 
> clustering keys.
>
> After data is populated, some delete queries by course_id and assignment_id 
> occurs, e.g. "DELETE FROM myTable WHERE course_id = 'C' AND assignment_id = 
> 'A1';". This would create tombstones so query "SELECT * FROM myTable WHERE 
> course_id = 'C';" would be affected, right? Would query "SELECT * FROM 
> myTable WHERE course_id = 'C' AND assignment_id = 'A2';" be affected too?
>
> For query "SELECT * FROM myTable WHERE course_id = 'C';", to workaround the 
> tombstone problem, we are thinking about not doing hard deletes, instead 
> doing soft deletes. So instead of doing "DELETE FROM myTable WHERE course_id 
> = 'C' AND assignment_id = 'A1';", we do "UPDATE myTable SET active = false 
> WHERE course_id = 'C' AND assignment_id = 'A1';". Then in the application, we 
> do query "SELECT * FROM myTable WHERE course_id = 'C';" and filter out 
> records that have "active" equal to "false". I am not really sure this would 
> improve performance because C* still has to scan through all records with the 
> partition key "C". It is just instead of scanning through X records + Y 
> tombstone records with hard deletes that generate tombstones, it now scans 
> through X + Y records with soft deletes and no tombstones. Am I right?
>
> Thanks.
>
> George

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to monitor datastax driver compression performance?

2019-04-09 Thread Jon Haddad
tlp-stress has support for customizing payloads, but it's not
documented very well.  For a given data model (say the KeyValue one),
you can override what tlp-stress will send over.  By default it's
pretty small, a handful of bytes.

If you pass --field.keyvalue.value (the table name + the field name)
then the custom field generator you'd like to use.  For example,
--field.keyvalue.value='random(1,11000)` will generate 10K random
characters.  You can also generate text from real words by using the
book(100,200) function (100-200 random works out of books) if you want
something that will compress better.

You can see a (poorly formatted) list of all the customizations you
can do by running `tlp-stress fields`

This is one the areas I haven't spent enough time on to share with the
world in a carefree manner, but it works.  If you're willing to
overlook the poor docs in the area I think it might meet your needs.

Regarding compression at the query level vs not, I think you should
look at the overhead first.  I'm betting you'll find it's
insignificant.  That said, you can always create two cluster objects
with two radically different settings if you find you need it.

On Tue, Apr 9, 2019 at 6:32 AM Gabriel Giussi  wrote:
>
> tlp-stress allow us to define size of rows? Because I will see the benefit of 
> compression in terms of request rates only if the compression ratio is 
> significant, i.e. requires less network round trips.
> This could be done generating bigger partitions with parameters -n and -p, 
> i.e. decreasing the -p?
>
> Also, don't you think that driver should allow configuring compression per 
> query? Because one table with wide rows could benefit from compression while 
> another one with less payload could not.
>
> Thanks for your help Jon.
>
>
> El lun., 8 abr. 2019 a las 19:13, Jon Haddad () escribió:
>>
>> If it were me, I'd look at raw request rates (in terms of requests /
>> second as well as request latency), network throughput and then some
>> flame graphs of both the server and your application:
>> https://github.com/jvm-profiling-tools/async-profiler.
>>
>> I've created an issue in tlp-stress to add compression options for the
>> driver: https://github.com/thelastpickle/tlp-stress/issues/67.  If
>> you're interested in contributing the feature I think tlp-stress will
>> more or less solve the remainder of the problem for you (the load
>> part, not the os numbers).
>>
>> Jon
>>
>>
>>
>>
>> On Mon, Apr 8, 2019 at 7:26 AM Gabriel Giussi  
>> wrote:
>> >
>> > Hi, I'm trying to test if adding driver compression will bring me any 
>> > benefit.
>> > I understand that the trade-off is less bandwidth but increased CPU usage 
>> > in both cassandra nodes (compression) and client nodes (decompression) but 
>> > I want to know what are the key metrics and how to monitor them to probe 
>> > compression is giving good results?
>> > I guess I should look at latency percentiles reported by 
>> > com.datastax.driver.core.Metrics and CPU usage, but what about bandwith 
>> > usage and compression ratio?
>> > Should I use tcpdump to capture packets length coming from cassandra 
>> > nodes? Something like tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | 
>> > sed -n "s/^.*length \(.*\).*$/\1/p" would be enough?
>> >
>> > Thanks
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Mahesh Daksha
Thank you Sean for your response. We are also suspecting the same and
analyzing/troubleshooting it around queries associated timestamp.

Thanks,
Mahesh Daksha


On Tue, Apr 9, 2019 at 7:08 PM Durity, Sean R 
wrote:

> My first suspicion would be to look at the server times in the cluster. It
> looks like other cases where a write occurs (with no errors) but the data
> is not retrieved as expected. If the write occurs with an earlier timestamp
> than the existing data, this is the behavior you would see. The write would
> occur, but it would not be the latest data to be retrieved. The write looks
> like it fails silently, but it actually does exactly what it is designed to
> do.
>
>
>
> Sean Durity
>
>
>
> *From:* Mahesh Daksha 
> *Sent:* Tuesday, April 09, 2019 9:10 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Issue while updating a record in 3 node cassandra
> cluster deployed using kubernetes
>
>
>
> Hello All,
>
> I have a 3 node cassandra cluster with Replication factor as 2 and
> read-write consistency set to QUORUM. We are using Spring data cassandra.
> All infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then
> we try to modify/update one of the record using save method of repo, like
> below:
>
> ChunkMeta *tmpRec* = chunkMetaRepository.*save*(chunkMeta);
>
> After execution of above statement we never see any exception or error.
> But still this update state goes silent/fail intermittently. That is at
> times the record in the db gets updated successfully where as other time it
> fails. Also in the above query when we print *tmpRec* it contains the
> updated and correct value every time. Still in the db these updated values
> doesn't get reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the
> our queries are getting logged there and are being executed also with out
> any error or exception.
>
> Now another weird observation is this all thing works erfectly fine if I
> am using single cassandra node (in kubernetes) or if we deploy above infra
> using ansible (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment
> of cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
>
>
>
> I have a 3 node cassandra cluster with Replication factor as 2 and
> read-write consistency set to QUORUM. We are using Spring data cassandra.
> All infrastructure is deployed using kubernetes.
>
> Now in normal use case many records gets inserted to cassandra table. Then
> we try to modify/update one of the record using save method of repo, like
> below:
>
> ChunkMeta tmpRec = chunkMetaRepository.*save*(chunkMeta);
>
> After execution of above statement we never see any exception or error.
> But still this update fail intermittently. That is when we check the record
> in the db sometime it gets updated successfully where as other time it
> fails. Also in the above query when we print *tmpRec* it contains the
> updated and correct value. Still in the db these updated values doesnt get
> reflected.
>
> We check the the cassandra transport TRACE logs on all nodes and found the
> our queries are getting logged there and are being executed also.
>
> Now another weird observation is this all thing works if I am using single
> cassandra node (in kubernetes) or if we deploy above infra using ansible
> (even works for 3 nodes for Ansible).
>
> It looks some issue is specifically with the kubernetes 3 node deployment
> of cassandra. Primarily looks like replication among nodes causing this.
>
> Please suggest.
>
> Below are the contents of  my cassandra Docker file:
>
> FROM ubuntu:16.04
>
>
>
> RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils 
> net-tools && apt-get clean && \
>
> addgroup testuser && useradd -g testuser testuser && usermod --password 
> testuser testuser;
>
>
>
> RUN mkdir -p /opt/test && \
>
> mkdir -p /opt/test/data;
>
>
>
> ADD jre8.tar.gz /opt/test/
>
> ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/
>
>
>
> RUN chmod 755 -R /opt/test/jre && \
>
> ln -s /opt/test/jre/bin/java /usr/bin/java && \
>
> mv /opt/test/apache-cassandra* /opt/test/cassandra;
>
>
>
> RUN mkdir -p /opt/test/cassandra/logs;
>
>
>
> ENV JAVA_HOME /opt/test/jre
>
> RUN export JAVA_HOME
>
>
>
> COPY version.txt /opt/test/cassandra/version.txt
>
>
>
> WORKDIR /opt/test/cassandra/bin/
>
>
>
> RUN mkdir -p /opt/test/data/saved_caches && \
>
> mkdir -p /opt/test/data/commitlog && \
>
> mkdir -p /opt/test/data/hints && \
>
> chown -R testuser:testuser /opt/test/data && \
>
> chown -R testuser:testuser /opt/test;
>
>
>
> USER testuser
>
>
>
> CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e 
> 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' 
> ../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && 
> ./cassandra -f
>
> 

Questions about C* performance related to tombstone

2019-04-09 Thread Li, George
Hi,

I have a table defined like this:

CREATE TABLE myTable (
course_id text,
assignment_id text,
assignment_item_id text,
data text,
boolean active,
PRIMARY KEY (course_id, assignment_id, assignment_item_id)
);
i.e. course_id as the partition key and assignment_id, assignment_item_id
as clustering keys.

After data is populated, some delete queries by course_id and assignment_id
occurs, e.g. "DELETE FROM myTable WHERE course_id = 'C' AND assignment_id =
'A1';". This would create tombstones so query "SELECT * FROM myTable WHERE
course_id = 'C';" would be affected, right? Would query "SELECT * FROM
myTable WHERE course_id = 'C' AND assignment_id = 'A2';" be affected too?

For query "SELECT * FROM myTable WHERE course_id = 'C';", to workaround the
tombstone problem, we are thinking about not doing hard deletes, instead
doing soft deletes. So instead of doing "DELETE FROM myTable WHERE
course_id = 'C' AND assignment_id = 'A1';", we do "UPDATE myTable SET
active = false WHERE course_id = 'C' AND assignment_id = 'A1';". Then in
the application, we do query "SELECT * FROM myTable WHERE course_id = 'C';"
and filter out records that have "active" equal to "false". I am not really
sure this would improve performance because C* still has to scan through
all records with the partition key "C". It is just instead of scanning
through X records + Y tombstone records with hard deletes that generate
tombstones, it now scans through X + Y records with soft deletes and no
tombstones. Am I right?

Thanks.

George


RE: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Durity, Sean R
My first suspicion would be to look at the server times in the cluster. It 
looks like other cases where a write occurs (with no errors) but the data is 
not retrieved as expected. If the write occurs with an earlier timestamp than 
the existing data, this is the behavior you would see. The write would occur, 
but it would not be the latest data to be retrieved. The write looks like it 
fails silently, but it actually does exactly what it is designed to do.

Sean Durity

From: Mahesh Daksha 
Sent: Tuesday, April 09, 2019 9:10 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster 
deployed using kubernetes


Hello All,

I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
consistency set to QUORUM. We are using Spring data cassandra. All 
infrastructure is deployed using kubernetes.

Now in normal use case many records gets inserted to cassandra table. Then we 
try to modify/update one of the record using save method of repo, like below:

ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);

After execution of above statement we never see any exception or error. But 
still this update state goes silent/fail intermittently. That is at times the 
record in the db gets updated successfully where as other time it fails. Also 
in the above query when we print tmpRec it contains the updated and correct 
value every time. Still in the db these updated values doesn't get reflected.

We check the the cassandra transport TRACE logs on all nodes and found the our 
queries are getting logged there and are being executed also with out any error 
or exception.

Now another weird observation is this all thing works erfectly fine if I am 
using single cassandra node (in kubernetes) or if we deploy above infra using 
ansible (even works for 3 nodes for Ansible).

It looks some issue is specifically with the kubernetes 3 node deployment of 
cassandra. Primarily looks like replication among nodes causing this.

Please suggest.



I have a 3 node cassandra cluster with Replication factor as 2 and read-write 
consistency set to QUORUM. We are using Spring data cassandra. All 
infrastructure is deployed using kubernetes.

Now in normal use case many records gets inserted to cassandra table. Then we 
try to modify/update one of the record using save method of repo, like below:

ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta);

After execution of above statement we never see any exception or error. But 
still this update fail intermittently. That is when we check the record in the 
db sometime it gets updated successfully where as other time it fails. Also in 
the above query when we print tmpRec it contains the updated and correct value. 
Still in the db these updated values doesnt get reflected.

We check the the cassandra transport TRACE logs on all nodes and found the our 
queries are getting logged there and are being executed also.

Now another weird observation is this all thing works if I am using single 
cassandra node (in kubernetes) or if we deploy above infra using ansible (even 
works for 3 nodes for Ansible).

It looks some issue is specifically with the kubernetes 3 node deployment of 
cassandra. Primarily looks like replication among nodes causing this.

Please suggest.

Below are the contents of  my cassandra Docker file:

FROM ubuntu:16.04



RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils 
net-tools && apt-get clean && \

addgroup testuser && useradd -g testuser testuser && usermod --password 
testuser testuser;



RUN mkdir -p /opt/test && \

mkdir -p /opt/test/data;



ADD jre8.tar.gz /opt/test/

ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/



RUN chmod 755 -R /opt/test/jre && \

ln -s /opt/test/jre/bin/java /usr/bin/java && \

mv /opt/test/apache-cassandra* /opt/test/cassandra;



RUN mkdir -p /opt/test/cassandra/logs;



ENV JAVA_HOME /opt/test/jre

RUN export JAVA_HOME



COPY version.txt /opt/test/cassandra/version.txt



WORKDIR /opt/test/cassandra/bin/



RUN mkdir -p /opt/test/data/saved_caches && \

mkdir -p /opt/test/data/commitlog && \

mkdir -p /opt/test/data/hints && \

chown -R testuser:testuser /opt/test/data && \

chown -R testuser:testuser /opt/test;



USER testuser



CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e 
's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' 
../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && ./cassandra 
-f

Please note conf.yml is basically cassandra.yml file having properties related 
to cassandra.



Thanks,

Mahesh Daksha



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken 

Re: How to monitor datastax driver compression performance?

2019-04-09 Thread Gabriel Giussi
tlp-stress allow us to define size of rows? Because I will see the benefit
of compression in terms of request rates only if the compression ratio is
significant, i.e. requires less network round trips.
This could be done generating bigger partitions with parameters -n and -p,
i.e. decreasing the -p?

Also, don't you think that driver should allow configuring compression per
query? Because one table with wide rows could benefit from compression
while another one with less payload could not.

Thanks for your help Jon.


El lun., 8 abr. 2019 a las 19:13, Jon Haddad () escribió:

> If it were me, I'd look at raw request rates (in terms of requests /
> second as well as request latency), network throughput and then some
> flame graphs of both the server and your application:
> https://github.com/jvm-profiling-tools/async-profiler.
>
> I've created an issue in tlp-stress to add compression options for the
> driver: https://github.com/thelastpickle/tlp-stress/issues/67.  If
> you're interested in contributing the feature I think tlp-stress will
> more or less solve the remainder of the problem for you (the load
> part, not the os numbers).
>
> Jon
>
>
>
>
> On Mon, Apr 8, 2019 at 7:26 AM Gabriel Giussi 
> wrote:
> >
> > Hi, I'm trying to test if adding driver compression will bring me any
> benefit.
> > I understand that the trade-off is less bandwidth but increased CPU
> usage in both cassandra nodes (compression) and client nodes
> (decompression) but I want to know what are the key metrics and how to
> monitor them to probe compression is giving good results?
> > I guess I should look at latency percentiles reported by
> com.datastax.driver.core.Metrics and CPU usage, but what about bandwith
> usage and compression ratio?
> > Should I use tcpdump to capture packets length coming from cassandra
> nodes? Something like tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | sed
> -n "s/^.*length \(.*\).*$/\1/p" would be enough?
> >
> > Thanks
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Mahesh Daksha
Hello All,

I have a 3 node cassandra cluster with Replication factor as 2 and
read-write consistency set to QUORUM. We are using Spring data cassandra.
All infrastructure is deployed using kubernetes.

Now in normal use case many records gets inserted to cassandra table. Then
we try to modify/update one of the record using save method of repo, like
below:

ChunkMeta *tmpRec* = chunkMetaRepository.*save*(chunkMeta);

After execution of above statement we never see any exception or error. But
still this update state goes silent/fail intermittently. That is at times
the record in the db gets updated successfully where as other time it
fails. Also in the above query when we print *tmpRec* it contains the
updated and correct value every time. Still in the db these updated values
doesn't get reflected.

We check the the cassandra transport TRACE logs on all nodes and found the
our queries are getting logged there and are being executed also with out
any error or exception.

Now another weird observation is this all thing works erfectly fine if I am
using single cassandra node (in kubernetes) or if we deploy above infra
using ansible (even works for 3 nodes for Ansible).

It looks some issue is specifically with the kubernetes 3 node deployment
of cassandra. Primarily looks like replication among nodes causing this.

Please suggest.



I have a 3 node cassandra cluster with Replication factor as 2 and
read-write consistency set to QUORUM. We are using Spring data cassandra.
All infrastructure is deployed using kubernetes.

Now in normal use case many records gets inserted to cassandra table. Then
we try to modify/update one of the record using save method of repo, like
below:

ChunkMeta tmpRec = chunkMetaRepository.*save*(chunkMeta);

After execution of above statement we never see any exception or error. But
still this update fail intermittently. That is when we check the record in
the db sometime it gets updated successfully where as other time it fails.
Also in the above query when we print *tmpRec* it contains the updated and
correct value. Still in the db these updated values doesnt get reflected.

We check the the cassandra transport TRACE logs on all nodes and found the
our queries are getting logged there and are being executed also.

Now another weird observation is this all thing works if I am using single
cassandra node (in kubernetes) or if we deploy above infra using ansible
(even works for 3 nodes for Ansible).

It looks some issue is specifically with the kubernetes 3 node deployment
of cassandra. Primarily looks like replication among nodes causing this.

Please suggest.

Below are the contents of  my cassandra Docker file:

FROM ubuntu:16.04

RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils
net-tools && apt-get clean && \
addgroup testuser && useradd -g testuser testuser && usermod
--password testuser testuser;

RUN mkdir -p /opt/test && \
mkdir -p /opt/test/data;

ADD jre8.tar.gz /opt/test/
ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/

RUN chmod 755 -R /opt/test/jre && \
ln -s /opt/test/jre/bin/java /usr/bin/java && \
mv /opt/test/apache-cassandra* /opt/test/cassandra;

RUN mkdir -p /opt/test/cassandra/logs;

ENV JAVA_HOME /opt/test/jre
RUN export JAVA_HOME

COPY version.txt /opt/test/cassandra/version.txt

WORKDIR /opt/test/cassandra/bin/

RUN mkdir -p /opt/test/data/saved_caches && \
mkdir -p /opt/test/data/commitlog && \
mkdir -p /opt/test/data/hints && \
chown -R testuser:testuser /opt/test/data && \
chown -R testuser:testuser /opt/test;

USER testuser

CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e
's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg;
s/\$\{([^}]+)\}//eg' ../conf/conf.yml > ../conf/cassandra.yaml && rm
../conf/conf.yml && ./cassandra -f

Please note conf.yml is basically cassandra.yml file having properties
related to cassandra.


Thanks,

Mahesh Daksha


Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Mahesh Daksha
Thank you Ben and Varun. Will try these approaches.

On Tue, Apr 9, 2019 at 3:12 PM Varun Barala  wrote:

> I'm not sure about the use cases. But other approaches can also be
> considered:-
>
> * Every mutation will have the timestamp in the commitlog [So taking
> backup of the commitlogs will give you this functionality]
> * At client side, you fetch the existing writetime for those columns from
> the db and also log the actual timestamp which is associated with the
> current update/insert statements
> https://docs.datastax.com/en/drivers/java/3.6/com/datastax/driver/core/Statement.html#getDefaultTimestamp--
> (though this should only be used for debugging purposes!)
>
>
> Thanks!
>
> On Tue, Apr 9, 2019 at 5:31 PM Ben Slater 
> wrote:
>
>> Maybe stabledump can help you?
>> https://cassandra.apache.org/doc/4.0/tools/sstable/sstabledump.html
>>
>> ---
>>
>>
>> *Ben Slater*
>> *Chief Product Officer*
>>
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Tue, 9 Apr 2019 at 19:26, Mahesh Daksha  wrote:
>>
>>> Thanks Ben for your response.
>>> WRITETIME  gives the information of about the column value already
>>> residing int the table. We intend to know  the timestamp of the record
>>> which is about to apply/update.
>>> This is needed to understand the timestamp difference of the data
>>> residing in table with the one going to overwite the same.
>>>
>>> This all information is needed as out update statements going silent
>>> (not reflecting any changes) in database. Not even returning any error or
>>> exception.
>>>
>>> Thanks,
>>> Mahesh Daksha
>>>
>>> On Tue, Apr 9, 2019 at 2:46 PM Ben Slater 
>>> wrote:
>>>
 Not in the logs but I think you should be able to use the WRITETIME
 function to view via CQL (see
 https://cassandra.apache.org/doc/latest/cql/dml.html#select)

 Cheers
 Ben

 ---


 *Ben Slater*
 *Chief Product Officer*


 
 
 

 Read our latest technical blog posts here
 .

 This email has been sent on behalf of Instaclustr Pty. Limited
 (Australia) and Instaclustr Inc (USA).

 This email and any attachments may contain confidential and legally
 privileged information.  If you are not the intended recipient, do not copy
 or disclose its content, but please reply to this email immediately and
 highlight the error to the sender and then immediately delete the message.


 On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:

> Hello,
>
> I have configured the timestamp generator at cassandra client as below:
>
> cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());
>
>
> My cassandra client inserting and updating few of the rows in a table.
> My query is where in the cassandra debug logs I can see the query
> write time associated by with updated columns in the update query (sent by
> cient). Or if there is any other way I can log the same at client
> itself.
>
> Basically I want to see the write time sent by client to cassandra
> cluster.
>
> Thanks,
> Mahesh Daksha
>



Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Varun Barala
I'm not sure about the use cases. But other approaches can also be
considered:-

* Every mutation will have the timestamp in the commitlog [So taking backup
of the commitlogs will give you this functionality]
* At client side, you fetch the existing writetime for those columns from
the db and also log the actual timestamp which is associated with the
current update/insert statements
https://docs.datastax.com/en/drivers/java/3.6/com/datastax/driver/core/Statement.html#getDefaultTimestamp--
(though this should only be used for debugging purposes!)


Thanks!

On Tue, Apr 9, 2019 at 5:31 PM Ben Slater 
wrote:

> Maybe stabledump can help you?
> https://cassandra.apache.org/doc/4.0/tools/sstable/sstabledump.html
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Tue, 9 Apr 2019 at 19:26, Mahesh Daksha  wrote:
>
>> Thanks Ben for your response.
>> WRITETIME  gives the information of about the column value already
>> residing int the table. We intend to know  the timestamp of the record
>> which is about to apply/update.
>> This is needed to understand the timestamp difference of the data
>> residing in table with the one going to overwite the same.
>>
>> This all information is needed as out update statements going silent (not
>> reflecting any changes) in database. Not even returning any error or
>> exception.
>>
>> Thanks,
>> Mahesh Daksha
>>
>> On Tue, Apr 9, 2019 at 2:46 PM Ben Slater 
>> wrote:
>>
>>> Not in the logs but I think you should be able to use the WRITETIME
>>> function to view via CQL (see
>>> https://cassandra.apache.org/doc/latest/cql/dml.html#select)
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater*
>>> *Chief Product Officer*
>>>
>>>
>>> 
>>> 
>>> 
>>>
>>> Read our latest technical blog posts here
>>> .
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:
>>>
 Hello,

 I have configured the timestamp generator at cassandra client as below:

 cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());


 My cassandra client inserting and updating few of the rows in a table.
 My query is where in the cassandra debug logs I can see the query write
 time associated by with updated columns in the update query (sent by
 cient). Or if there is any other way I can log the same at client
 itself.

 Basically I want to see the write time sent by client to cassandra
 cluster.

 Thanks,
 Mahesh Daksha

>>>


Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Ben Slater
Maybe stabledump can help you?
https://cassandra.apache.org/doc/4.0/tools/sstable/sstabledump.html

---


*Ben Slater*
*Chief Product Officer*


   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 9 Apr 2019 at 19:26, Mahesh Daksha  wrote:

> Thanks Ben for your response.
> WRITETIME  gives the information of about the column value already
> residing int the table. We intend to know  the timestamp of the record
> which is about to apply/update.
> This is needed to understand the timestamp difference of the data residing
> in table with the one going to overwite the same.
>
> This all information is needed as out update statements going silent (not
> reflecting any changes) in database. Not even returning any error or
> exception.
>
> Thanks,
> Mahesh Daksha
>
> On Tue, Apr 9, 2019 at 2:46 PM Ben Slater 
> wrote:
>
>> Not in the logs but I think you should be able to use the WRITETIME
>> function to view via CQL (see
>> https://cassandra.apache.org/doc/latest/cql/dml.html#select)
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater*
>> *Chief Product Officer*
>>
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:
>>
>>> Hello,
>>>
>>> I have configured the timestamp generator at cassandra client as below:
>>>
>>> cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());
>>>
>>>
>>> My cassandra client inserting and updating few of the rows in a table.
>>> My query is where in the cassandra debug logs I can see the query write
>>> time associated by with updated columns in the update query (sent by
>>> cient). Or if there is any other way I can log the same at client
>>> itself.
>>>
>>> Basically I want to see the write time sent by client to cassandra
>>> cluster.
>>>
>>> Thanks,
>>> Mahesh Daksha
>>>
>>


Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Mahesh Daksha
Thanks Ben for your response.
WRITETIME  gives the information of about the column value already residing
int the table. We intend to know  the timestamp of the record which is
about to apply/update.
This is needed to understand the timestamp difference of the data residing
in table with the one going to overwite the same.

This all information is needed as out update statements going silent (not
reflecting any changes) in database. Not even returning any error or
exception.

Thanks,
Mahesh Daksha

On Tue, Apr 9, 2019 at 2:46 PM Ben Slater 
wrote:

> Not in the logs but I think you should be able to use the WRITETIME
> function to view via CQL (see
> https://cassandra.apache.org/doc/latest/cql/dml.html#select)
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:
>
>> Hello,
>>
>> I have configured the timestamp generator at cassandra client as below:
>>
>> cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());
>>
>> My cassandra client inserting and updating few of the rows in a table.
>> My query is where in the cassandra debug logs I can see the query write
>> time associated by with updated columns in the update query (sent by
>> cient). Or if there is any other way I can log the same at client itself.
>>
>> Basically I want to see the write time sent by client to cassandra
>> cluster.
>>
>> Thanks,
>> Mahesh Daksha
>>
>


Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Ben Slater
Not in the logs but I think you should be able to use the WRITETIME
function to view via CQL (see
https://cassandra.apache.org/doc/latest/cql/dml.html#select)

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:

> Hello,
>
> I have configured the timestamp generator at cassandra client as below:
>
> cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());
>
> My cassandra client inserting and updating few of the rows in a table.
> My query is where in the cassandra debug logs I can see the query write
> time associated by with updated columns in the update query (sent by
> cient). Or if there is any other way I can log the same at client itself.
>
> Basically I want to see the write time sent by client to cassandra cluster.
>
> Thanks,
> Mahesh Daksha
>


Re: Multi-DC replication and hinted handoff

2019-04-09 Thread Jens Fischer
Hi,

an update: I am pretty sure it is a problem with insufficient bandwidth. I 
can’t be sure because Cassandra does not seem to provide debug information on 
hint creation (only when replaying hints). When the bandwidth issue is solved I 
will try to reproduce the accumulation of hints by artificially limiting the 
bandwidth.

BG
Jens

On 3. Apr 2019, at 01:48, Stefan Miklosovic 
mailto:stefan.mikloso...@instaclustr.com>> 
wrote:

Hi Jens,

I am reading Cassandra The definitive guide and there is a chapter 9 - Reading 
and Writing Data and section The Cassandra Write Path and this sentence in it:

If a replica does not respond within the timeout, it is presumed to be down and 
a hint is stored for the write.

So your node might be actually fine eventually but it just can not cope with 
the load and it will reply too late after a coordinator has sufficient replies 
from other replicas. So it makes a hint for that write and for that node. I am 
not sure how is this related to turning off handoffs completely. I can do some 
tests locally if time allows to investigate various scenarios. There might be 
some subtle differences 

On Wed, 3 Apr 2019 at 07:19, Jens Fischer 
mailto:j.fisc...@sonnen.de>> wrote:
Yes, Apache Cassandra 3.11.2 (no DSE).

On 2. Apr 2019, at 19:40, sankalp kohli 
mailto:kohlisank...@gmail.com>> wrote:

Are you using OSS C*?

On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer 
mailto:j.fisc...@sonnen.de>> wrote:
Hi,

I have a Cassandra setup with multiple data centres. The vast majority of 
writes are LOCAL_ONE writes to data center DC-A. One node (lets call this node 
A1) in DC-A has accumulated large amounts of hint files (~100 GB). In the logs 
of this node I see lots of messages like the following:

INFO  [HintsDispatcher:26] 2019-03-28 01:49:25,217 
HintsDispatchExecutor.java:289 - Finished hinted handoff of file 
db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint 
/10.10.2.55: db485ac6-8acd-4241-9e21-7a2b540459de

The node 10.10.2.55 is in DC-B, lets call this node B1. There is no indication 
whatsoever that B1 was down: Nothing in our monitoring, nothing in the logs of 
B1, nothing in the logs of A1. Are there any other situations where hints to B1 
are stored at A1? Other than A1's failure detection detecting B1 as down I 
mean. For example could the reason for the hints be that B1 is overloaded and 
can not handle the intake from the A1? Or that the network connection between 
DC-A and DC-B is to slow?

While researching this I also found the following information on Stack Overflow 
from Ben Slater regarding hints and multi-dc replication:

Another factor here is the consistency level you are using - a LOCAL_* 
consistency level will only require writes to be written to the local DC for 
the operation to be considered a success (and hints will be stored for 
replication to the other DC).
(…)
The hints are the records of writes that have been made in one DC that are not 
yet replicated to the other DC (or even nodes within a DC). I think your 
options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*) 
consistency - this will slow down your writes but will ensure writes go into 
both DCs before the op completes (2) Don't replicate the data to the second DC 
(by setting the replication factor to 0 for the second DC in the keyspace 
definition) (3) Increase the capacity of the second DC so it can keep up with 
the writes (4) Slow down your writes so the second DC can keep up.

Source: https://stackoverflow.com/a/37382726

This reads like hints are used for “normal” (async) replication between data 
centres, i.e. hints could show up without any nodes being down whatsoever. This 
could explain what I am seeing. Does anyone now more about this? Does that mean 
I will see hints even if I disable hinted handoff?

Any pointers or help are greatly appreciated!

Thanks in advance
Jens


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, 
Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, 
USt.-IdNr. DE272208908


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, 
Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, 
USt.-IdNr. DE272208908


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, 
Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, 
USt.-IdNr. DE272208908


***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Mahesh Daksha
Hello,

I have configured the timestamp generator at cassandra client as below:

cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());

My cassandra client inserting and updating few of the rows in a table.
My query is where in the cassandra debug logs I can see the query write
time associated by with updated columns in the update query (sent by
cient). Or if there is any other way I can log the same at client itself.

Basically I want to see the write time sent by client to cassandra cluster.

Thanks,
Mahesh Daksha