Re: Java GC pauses, reality check

2016-11-26 Thread Benjamin Roth
You are of course right. There is no solution and no language that is a
perfect match for every situation and every solution and language has it's
own pros, cons, pitfalls and drawbacks.
Actually that article you posted points at some aspect of ARC, I wasn't
aware of, yet.
Nevertheless, GC is an issue for Cassandra, otherwise this thread would not
exist, right? But we have to deal with it and get the best out of it.

Another option, besides optimizing your GC: You could check if
http://www.scylladb.com/ is an option for you.
They rewrote CS from the scratch. The goal is to be completely compatible
with CS but to be much, much faster. Check their benchmarks and their
architecture.
I really do not want do depreciate the work of all the Cassandra Developers
- they did a great job - but what I have seen there looked very interesting
and promising! By the way it's written in C++.


2016-11-27 7:06 GMT+01:00 Kant Kodali :

> Automatic Reference counting sounds like college level idea that we all
> have been hearing for since GC is born! There seem to be bunch of cons of
> ARC as explained here
>
> https://www.quora.com/Why-doesnt-Apple-Swift-adopt-the-
> memory-management-method-of-garbage-collection-like-in-Java
>
> Maintaining C and C++ APPS are never a pain? How about versioning and
> static time libraries? There is work there too. so its all pros and cons
>
> "gc is a pain in the ass". How about seg faults? they aren't any lesser
> pain :)
>
> Not only Cassandra that runs on JVM. Majority of Apache projects do run on
> JVM for a reason.
>
> Bottom line. My point here is there are pros and cons of every language.
> It doesn't make much sense to target one language.
>
>
>
>
>
>
> On Sat, Nov 26, 2016 at 9:31 PM, Benjamin Roth 
> wrote:
>
>> Arc means Automatic Reference counting which is done at compilen time. Eg
>> Objektive c and Swift use this technique. There are absolutely No gc's. Its
>> a completely different memory Management technique.
>>
>> Why i dont like Java on Server side? Because gc is a pain in the ass. I
>> am doing this Business since over 15 years and running/maintaining Apps
>> that are build in c or c++ has never been such a pain.
>>
>> On the other Hand Java is easier to handle for Developers. And coding
>> plain c is also a pain.
>>
>> Thats why i Said its a philosophic discussion.
>> Anyway Cassandra rund on Java so We have to Deal with it.
>>
>> Am 27.11.2016 05:28 schrieb "Kant Kodali" :
>>
>>> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
>>> completely I mean no GC pauses whatsoever.
>>>
>>> When you say Java is NOT the First choice for Server Applications you
>>> are generalizing it too much I would say since many of them fall under that
>>> category. Either way the statement you made is purely subjective.
>>>
>>> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth 
>>> wrote:
>>>
 Lol. The counter proof is to use another memory Model like Arc. Thats
 why i personally think Java is NOT the First choice for Server
 Applications. But thats a philosophic discussion.

 Am 25.11.2016 23:38 schrieb "Kant Kodali" :

> +1 Chris Lohfink response
>
> I would also restate the following sentence "java GC pauses are
> pretty much a fact of life" to "Any GC based system pauses are pretty
> much a fact of life".
>
> I would be more than happy to see if someone can counter prove.
>
>
>
> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink 
> wrote:
>
>> No tuning will eliminate gcs.
>>
>> 20-30 seconds is horrific and out of the ordinary. Most likely
>> implementing antipatterns and/or poorly configured. Sub 1s is realistic 
>> but
>> with some workloads still may require some tuning to maintain. Some
>> workloads are very unfriendly to GCs though (ie heavy tombstones, very 
>> wide
>> partitions).
>>
>> Chris
>>
>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed 
>> wrote:
>>
>>> Hello!
>>>
>>> From what I understand java GC pauses are pretty much a fact of
>>> life, but you can tune the jvm to reduce the likelihood of the frequency
>>> and length of GC pauses.
>>>
>>> When using Cassandra, how frequent or long have these pauses known
>>> to be?  Even with tuning, is it safe to assume they cannot be 
>>> eliminated?
>>>
>>> Would a 20-30 second pause be something out of the ordinary?
>>>
>>> Thanks.
>>>
>>
>>
>
>>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Java GC pauses, reality check

2016-11-26 Thread Kant Kodali
Automatic Reference counting sounds like college level idea that we all
have been hearing for since GC is born! There seem to be bunch of cons of
ARC as explained here

https://www.quora.com/Why-doesnt-Apple-Swift-adopt-the-memory-management-method-of-garbage-collection-like-in-Java

Maintaining C and C++ APPS are never a pain? How about versioning and
static time libraries? There is work there too. so its all pros and cons

"gc is a pain in the ass". How about seg faults? they aren't any lesser
pain :)

Not only Cassandra that runs on JVM. Majority of Apache projects do run on
JVM for a reason.

Bottom line. My point here is there are pros and cons of every language. It
doesn't make much sense to target one language.






On Sat, Nov 26, 2016 at 9:31 PM, Benjamin Roth 
wrote:

> Arc means Automatic Reference counting which is done at compilen time. Eg
> Objektive c and Swift use this technique. There are absolutely No gc's. Its
> a completely different memory Management technique.
>
> Why i dont like Java on Server side? Because gc is a pain in the ass. I am
> doing this Business since over 15 years and running/maintaining Apps that
> are build in c or c++ has never been such a pain.
>
> On the other Hand Java is easier to handle for Developers. And coding
> plain c is also a pain.
>
> Thats why i Said its a philosophic discussion.
> Anyway Cassandra rund on Java so We have to Deal with it.
>
> Am 27.11.2016 05:28 schrieb "Kant Kodali" :
>
>> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
>> completely I mean no GC pauses whatsoever.
>>
>> When you say Java is NOT the First choice for Server Applications you
>> are generalizing it too much I would say since many of them fall under that
>> category. Either way the statement you made is purely subjective.
>>
>> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth 
>> wrote:
>>
>>> Lol. The counter proof is to use another memory Model like Arc. Thats
>>> why i personally think Java is NOT the First choice for Server
>>> Applications. But thats a philosophic discussion.
>>>
>>> Am 25.11.2016 23:38 schrieb "Kant Kodali" :
>>>
 +1 Chris Lohfink response

 I would also restate the following sentence "java GC pauses are pretty
 much a fact of life" to "Any GC based system pauses are pretty much a
 fact of life".

 I would be more than happy to see if someone can counter prove.



 On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink 
 wrote:

> No tuning will eliminate gcs.
>
> 20-30 seconds is horrific and out of the ordinary. Most likely
> implementing antipatterns and/or poorly configured. Sub 1s is realistic 
> but
> with some workloads still may require some tuning to maintain. Some
> workloads are very unfriendly to GCs though (ie heavy tombstones, very 
> wide
> partitions).
>
> Chris
>
> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:
>
>> Hello!
>>
>> From what I understand java GC pauses are pretty much a fact of life,
>> but you can tune the jvm to reduce the likelihood of the frequency and
>> length of GC pauses.
>>
>> When using Cassandra, how frequent or long have these pauses known to
>> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>>
>> Would a 20-30 second pause be something out of the ordinary?
>>
>> Thanks.
>>
>
>

>>


Re: Java GC pauses, reality check

2016-11-26 Thread Benjamin Roth
Arc means Automatic Reference counting which is done at compilen time. Eg
Objektive c and Swift use this technique. There are absolutely No gc's. Its
a completely different memory Management technique.

Why i dont like Java on Server side? Because gc is a pain in the ass. I am
doing this Business since over 15 years and running/maintaining Apps that
are build in c or c++ has never been such a pain.

On the other Hand Java is easier to handle for Developers. And coding plain
c is also a pain.

Thats why i Said its a philosophic discussion.
Anyway Cassandra rund on Java so We have to Deal with it.

Am 27.11.2016 05:28 schrieb "Kant Kodali" :

> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
> completely I mean no GC pauses whatsoever.
>
> When you say Java is NOT the First choice for Server Applications you are
> generalizing it too much I would say since many of them fall under that
> category. Either way the statement you made is purely subjective.
>
> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth 
> wrote:
>
>> Lol. The counter proof is to use another memory Model like Arc. Thats why
>> i personally think Java is NOT the First choice for Server Applications.
>> But thats a philosophic discussion.
>>
>> Am 25.11.2016 23:38 schrieb "Kant Kodali" :
>>
>>> +1 Chris Lohfink response
>>>
>>> I would also restate the following sentence "java GC pauses are pretty
>>> much a fact of life" to "Any GC based system pauses are pretty much a
>>> fact of life".
>>>
>>> I would be more than happy to see if someone can counter prove.
>>>
>>>
>>>
>>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink 
>>> wrote:
>>>
 No tuning will eliminate gcs.

 20-30 seconds is horrific and out of the ordinary. Most likely
 implementing antipatterns and/or poorly configured. Sub 1s is realistic but
 with some workloads still may require some tuning to maintain. Some
 workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
 partitions).

 Chris

 On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:

> Hello!
>
> From what I understand java GC pauses are pretty much a fact of life,
> but you can tune the jvm to reduce the likelihood of the frequency and
> length of GC pauses.
>
> When using Cassandra, how frequent or long have these pauses known to
> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>
> Would a 20-30 second pause be something out of the ordinary?
>
> Thanks.
>


>>>
>


Re: Java GC pauses, reality check

2016-11-26 Thread Kant Kodali
@Harikrishnan Pillai: How many nodes you guys are running? and what is an
approximate read size and an approximate write size?

On Fri, Nov 25, 2016 at 7:32 PM, Harikrishnan Pillai <
hpil...@walmartlabs.com> wrote:

> We are running azul zing in prod with 1 million reads/s and 100 K writes/s
> with azul .we never had a major gc above 10 ms .
>
> Sent from my iPhone
>
> > On Nov 25, 2016, at 3:49 PM, Martin Schröder  wrote:
> >
> > 2016-11-25 23:38 GMT+01:00 Kant Kodali :
> >> I would also restate the following sentence "java GC pauses are pretty
> much
> >> a fact of life" to "Any GC based system pauses are pretty much a fact of
> >> life".
> >>
> >> I would be more than happy to see if someone can counter prove.
> >
> > Azul disagrees.
> > https://www.azul.com/products/zing/pgc/
> >
> > Best
> >   Martin
>


Re: Java GC pauses, reality check

2016-11-26 Thread Kant Kodali
Good to know about Zing! I will have to take a look.

On Sat, Nov 26, 2016 at 8:27 PM, Kant Kodali  wrote:

> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
> completely I mean no GC pauses whatsoever.
>
> When you say Java is NOT the First choice for Server Applications you are
> generalizing it too much I would say since many of them fall under that
> category. Either way the statement you made is purely subjective.
>
> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth 
> wrote:
>
>> Lol. The counter proof is to use another memory Model like Arc. Thats why
>> i personally think Java is NOT the First choice for Server Applications.
>> But thats a philosophic discussion.
>>
>> Am 25.11.2016 23:38 schrieb "Kant Kodali" :
>>
>>> +1 Chris Lohfink response
>>>
>>> I would also restate the following sentence "java GC pauses are pretty
>>> much a fact of life" to "Any GC based system pauses are pretty much a
>>> fact of life".
>>>
>>> I would be more than happy to see if someone can counter prove.
>>>
>>>
>>>
>>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink 
>>> wrote:
>>>
 No tuning will eliminate gcs.

 20-30 seconds is horrific and out of the ordinary. Most likely
 implementing antipatterns and/or poorly configured. Sub 1s is realistic but
 with some workloads still may require some tuning to maintain. Some
 workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
 partitions).

 Chris

 On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:

> Hello!
>
> From what I understand java GC pauses are pretty much a fact of life,
> but you can tune the jvm to reduce the likelihood of the frequency and
> length of GC pauses.
>
> When using Cassandra, how frequent or long have these pauses known to
> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>
> Would a 20-30 second pause be something out of the ordinary?
>
> Thanks.
>


>>>
>


Re: Java GC pauses, reality check

2016-11-26 Thread Kant Kodali
Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
completely I mean no GC pauses whatsoever.

When you say Java is NOT the First choice for Server Applications you are
generalizing it too much I would say since many of them fall under that
category. Either way the statement you made is purely subjective.

On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth 
wrote:

> Lol. The counter proof is to use another memory Model like Arc. Thats why
> i personally think Java is NOT the First choice for Server Applications.
> But thats a philosophic discussion.
>
> Am 25.11.2016 23:38 schrieb "Kant Kodali" :
>
>> +1 Chris Lohfink response
>>
>> I would also restate the following sentence "java GC pauses are pretty
>> much a fact of life" to "Any GC based system pauses are pretty much a
>> fact of life".
>>
>> I would be more than happy to see if someone can counter prove.
>>
>>
>>
>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink 
>> wrote:
>>
>>> No tuning will eliminate gcs.
>>>
>>> 20-30 seconds is horrific and out of the ordinary. Most likely
>>> implementing antipatterns and/or poorly configured. Sub 1s is realistic but
>>> with some workloads still may require some tuning to maintain. Some
>>> workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
>>> partitions).
>>>
>>> Chris
>>>
>>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:
>>>
 Hello!

 From what I understand java GC pauses are pretty much a fact of life,
 but you can tune the jvm to reduce the likelihood of the frequency and
 length of GC pauses.

 When using Cassandra, how frequent or long have these pauses known to
 be?  Even with tuning, is it safe to assume they cannot be eliminated?

 Would a 20-30 second pause be something out of the ordinary?

 Thanks.

>>>
>>>
>>


Re: Does recovery continue after truncating a table?

2016-11-26 Thread Ben Slater
By “undocumented limitation”, I meant “TRUNCATE” is mainly only used in
development and testing, not production scenarios so a sufficient fix (and
certainly a better than nothing fix) might be just to document that if you
issue a TRUNCATE while there are still hinted hand-offs pending the hinted
hand-offs replayed after the truncate will come back to life. Of course, an
actual fix would be better.

Cheers
Ben

On Sat, 26 Nov 2016 at 21:08 Hiroyuki Yamada  wrote:

> Hi Yuji and Ben,
>
> I tried out this revised script and the same issue occurred to me, too.
> I think it's definitely a bug to be solved asap.
>
> >Ben
> What do you mean "an undocumented limitation" ?
>
> Thanks,
> Hiro
>
> On Sat, Nov 26, 2016 at 3:13 PM, Ben Slater 
> wrote:
> > Nice detective work! Seems to me that it’s a best an undocumented
> limitation
> > and potentially could be viewed as a bug - maybe log another JIRA?
> >
> > One node - there is a nodetool truncatehints command that could be used
> to
> > clear out the hints
> > (
> http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlight=truncate
> )
> > . However, it seems to clear all hints on particular endpoint, not just
> for
> > a specific table.
> >
> > Cheers
> > Ben
> >
> > On Fri, 25 Nov 2016 at 17:42 Yuji Ito  wrote:
> >>
> >> Hi all,
> >>
> >> I revised the script to reproduce the issue.
> >> I think the issue happens more frequently than before.
> >> Killing another node is added to the previous script.
> >>
> >>  [script] 
> >> #!/bin/sh
> >>
> >> node1_ip=
> >> node2_ip=
> >> node3_ip=
> >> node2_user=
> >> node3_user=
> >> rows=1
> >>
> >> echo "consistency quorum;" > init_data.cql
> >> for key in $(seq 0 $(expr $rows - 1))
> >> do
> >> echo "insert into testdb.testtbl (key, val) values($key, ) IF
> NOT
> >> EXISTS;" >> init_data.cql
> >> done
> >>
> >> while true
> >> do
> >> echo "truncate the table"
> >> cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1
> >> if [ $? -ne 0 ]; then
> >> echo "truncating failed"
> >> continue
> >> else
> >> break
> >> fi
> >> done
> >>
> >> echo "kill C* process on node3"
> >> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep
> CassandraDaemon |
> >> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
> >>
> >> echo "insert $rows rows"
> >> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
> >>
> >> echo "restart C* process on node3"
> >> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra
> start"
> >>
> >> while true
> >> do
> >> echo "truncate the table again"
> >> cqlsh $node1_ip -e "truncate table testdb.testtbl"
> >> if [ $? -ne 0 ]; then
> >> echo "truncating failed"
> >> continue
> >> else
> >> echo "truncation succeeded!"
> >> break
> >> fi
> >> done
> >>
> >> echo "kill C* process on node2"
> >> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep
> CassandraDaemon |
> >> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
> >>
> >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> >> count(*) from testdb.testtbl;"
> >> sleep 10
> >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> >> count(*) from testdb.testtbl;"
> >>
> >> echo "restart C* process on node2"
> >> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra
> start"
> >>
> >>
> >> Thanks,
> >> yuji
> >>
> >>
> >> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito  wrote:
> >>>
> >>> I investigated source code and logs of killed node.
> >>> I guess that unexpected writes are executed when truncation is being
> >>> executed.
> >>>
> >>> Some writes were executed after flush (the first flush) in truncation
> and
> >>> these writes could be read.
> >>> These writes were requested as MUTATION by another node for hinted
> >>> handoff.
> >>> Their data was stored to a new memtable and flushed (the second flush)
> to
> >>> a new SSTable before snapshot in truncation.
> >>> So, the truncation discarded only old SSTables, not the new SSTable.
> >>> That's because ReplayPosition which was used for discarding SSTable was
> >>> that of the first flush.
> >>>
> >>> I copied some parts of log as below.
> >>> "##" line is my comment.
> >>> The point is that the ReplayPosition is moved forward by the second
> >>> flush.
> >>> It means some writes are executed after the first flush.
> >>>
> >>> == log ==
> >>> ## started truncation
> >>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612
> >>> ColumnFamilyStore.java:2790 - truncating testtbl
> >>> ## the first flush started before truncation
> >>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612
> >>> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%)
> >>> on-heap, 0 (0%) off-heap
> >>> INFO  [MemtableFlushWriter:1] 2016-11-17 08:36:04,613
> Memtable.java:352 -
> >>> Writing 

Re: Java GC pauses, reality check

2016-11-26 Thread Oleksandr Shulgin
On Nov 26, 2016 20:52, "Graham Sanderson"  wrote:

It was removed in the 3.0.x line, but not in the 3.x line (post 9472) as
far as I can tell. It looks to be available in 3.11 and in 3.X branches


Thanks, you are correct. I'm confused.

On Nov 26, 2016, at 1:17 PM, Oleksandr Shulgin 
wrote:

On Nov 26, 2016 20:04, "Graham Sanderson"  wrote:

Not AFAIK; https://issues.apache.org/jira/browse/CASSANDRA-9472 is marked
as resolved in 3.4, though we are not running it so I can’t say much about
it.


But I was referring to https://issues.apache.org/jira/browse/CASSANDRA-11039
which removed it again in 3.10 and 3.0.10.

--
Alex

It looks like Zing is no longer tied price wise per core which was a show
stopper for us, but it is now priced per server which may affect others
differently.

Note in fact ironically, running 2.1.x with off heap memtables, we had some
of our JVMs running for over a year which made us hit
https://issues.apache.org/jira/browse/CASSANDRA-10969 when we restarted
some nodes for other reasons.

On Nov 26, 2016, at 12:07 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

On Nov 25, 2016 23:47, "Graham Sanderson"  wrote:

If you are seeing 25-30 second GC pauses then (unless you are so badly
configured) seeing full GC under CMS (though G1 may have similar problems).

With CMS eventual fragmentation causing promotion failure is inevitable
(unless you cycle your nodes before it happens). Either your heap has way
too big an old gen, or too small a young gen (but then you need pretty
hefty boxes to be able to run with a large young gen - of the say 4-8G
range) without young collections taking too long.

Depending on your C* version I would highly recommend off heap men-tables.
With those we were able to considerably reduce our heap sizes, despite
having large throughput on a smallish number of nodes.


Aren't offheap memtables discontinued in the most recent releases of 3.0
and 3.x for a good reason? I thought using them could lead to segfaults?

--
Alex

I recommend reading this if you use CMS http://blog.ragozin.info/2
011/10/java-cg-hotspots-cms-and-heap.html, and also not that if you see a
lot of objects of size 131074 in promotion failures then memtables are the
problem - you can try and flush them sooner, but moving them off heap works
better I think.

On Nov 25, 2016, at 4:38 PM, Kant Kodali  wrote:

+1 Chris Lohfink response

I would also restate the following sentence "java GC pauses are pretty much
a fact of life" to "Any GC based system pauses are pretty much a fact of
life".

I would be more than happy to see if someone can counter prove.



On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink  wrote:

> No tuning will eliminate gcs.
>
> 20-30 seconds is horrific and out of the ordinary. Most likely
> implementing antipatterns and/or poorly configured. Sub 1s is realistic but
> with some workloads still may require some tuning to maintain. Some
> workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
> partitions).
>
> Chris
>
> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:
>
>> Hello!
>>
>> From what I understand java GC pauses are pretty much a fact of life, but
>> you can tune the jvm to reduce the likelihood of the frequency and length
>> of GC pauses.
>>
>> When using Cassandra, how frequent or long have these pauses known to
>> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>>
>> Would a 20-30 second pause be something out of the ordinary?
>>
>> Thanks.
>>
>
>


Re: Java GC pauses, reality check

2016-11-26 Thread Graham Sanderson
It was removed in the 3.0.x line, but not in the 3.x line (post 9472) as far as 
I can tell. It looks to be available in 3.11 and in 3.X branches

> On Nov 26, 2016, at 1:17 PM, Oleksandr Shulgin  
> wrote:
> 
> On Nov 26, 2016 20:04, "Graham Sanderson"  > wrote:
> Not AFAIK; https://issues.apache.org/jira/browse/CASSANDRA-9472 
>  is marked as resolved 
> in 3.4, though we are not running it so I can’t say much about it.
> 
> But I was referring to https://issues.apache.org/jira/browse/CASSANDRA-11039 
>  which removed it 
> again in 3.10 and 3.0.10.
> 
> --
> Alex
> 
> It looks like Zing is no longer tied price wise per core which was a show 
> stopper for us, but it is now priced per server which may affect others 
> differently.
> 
> Note in fact ironically, running 2.1.x with off heap memtables, we had some 
> of our JVMs running for over a year which made us hit 
> https://issues.apache.org/jira/browse/CASSANDRA-10969 
>  when we restarted 
> some nodes for other reasons.
> 
>> On Nov 26, 2016, at 12:07 AM, Oleksandr Shulgin 
>> > wrote:
>> 
>> On Nov 25, 2016 23:47, "Graham Sanderson" > > wrote:
>> If you are seeing 25-30 second GC pauses then (unless you are so badly 
>> configured) seeing full GC under CMS (though G1 may have similar problems).
>> 
>> With CMS eventual fragmentation causing promotion failure is inevitable 
>> (unless you cycle your nodes before it happens). Either your heap has way 
>> too big an old gen, or too small a young gen (but then you need pretty hefty 
>> boxes to be able to run with a large young gen - of the say 4-8G range) 
>> without young collections taking too long.
>> 
>> Depending on your C* version I would highly recommend off heap men-tables. 
>> With those we were able to considerably reduce our heap sizes, despite 
>> having large throughput on a smallish number of nodes.
>> 
>> Aren't offheap memtables discontinued in the most recent releases of 3.0 and 
>> 3.x for a good reason? I thought using them could lead to segfaults?
>> 
>> --
>> Alex
>> 
>> I recommend reading this if you use CMS 
>> http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html 
>> , and 
>> also not that if you see a lot of objects of size 131074 in promotion 
>> failures then memtables are the problem - you can try and flush them sooner, 
>> but moving them off heap works better I think.
>> 
>>> On Nov 25, 2016, at 4:38 PM, Kant Kodali >> > wrote:
>>> 
>>> +1 Chris Lohfink response
>>> 
>>> I would also restate the following sentence "java GC pauses are pretty much 
>>> a fact of life" to "Any GC based system pauses are pretty much a fact of 
>>> life".
>>> 
>>> I would be more than happy to see if someone can counter prove.
>>> 
>>> 
>>> 
>>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink >> > wrote:
>>> No tuning will eliminate gcs.
>>> 
>>> 20-30 seconds is horrific and out of the ordinary. Most likely implementing 
>>> antipatterns and/or poorly configured. Sub 1s is realistic but with some 
>>> workloads still may require some tuning to maintain. Some workloads are 
>>> very unfriendly to GCs though (ie heavy tombstones, very wide partitions).
>>> 
>>> Chris
>>> 
>>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed >> > wrote:
>>> Hello!
>>> 
>>> From what I understand java GC pauses are pretty much a fact of life, but 
>>> you can tune the jvm to reduce the likelihood of the frequency and length 
>>> of GC pauses.
>>> 
>>> When using Cassandra, how frequent or long have these pauses known to be?  
>>> Even with tuning, is it safe to assume they cannot be eliminated?
>>> 
>>> Would a 20-30 second pause be something out of the ordinary?
>>> 
>>> Thanks.
>>> 
>>> 
>> 
>> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Configure NTP for Cassandra

2016-11-26 Thread Anuj Wadehra
Hi,
One popular NTP setup recommended for Cassandra users is described at 
Thankshttps://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
 .
Summary of article is:Setup recommends a dedicated pool of internal NTP servers 
which are associated as peers to provide a HA NTP service. Cassandra nodes sync 
to this dedicated pool but define one internal NTP server as preferred server 
to ensure relative clock synchronization. Internal NTP servers sync to external 
NTP servers.
My questions:
1. If my ISP provider is providing me a pool of reliable NTP servers, should I 
setup my own internal servers anyway or can I sync Cassandra nodes directly to 
the ISP provided servers and select one of the servers as preferred for 
relative clock synchronization?

I agree. If you have to rely on public NTP pool which selects random servers 
for sync, having an internal NTP server pool is justified for getting tight 
relative sync as described in the blog 
2. As per my understanding, peer association is ONLY for backup scenario . If a 
peer loses time synchronization source, then other peers can be used for time 
synchronization. Thus providing a HA service. But when everything is ok (happy 
path), does defining NTP servers synced from different sources as peers lead 
them to converge time as mentioned in some forums?
e.g. if A and B are peers and thier times are 9:00:00 and 9:00:10 after syncing 
with respective time sources, then will they converge their clocks as 9:00:05?
I doubt the above claim regarding time converge. Also no formal doc says that. 
Comments?

ThanksAnuj


Re: Java GC pauses, reality check

2016-11-26 Thread Oleksandr Shulgin
On Nov 26, 2016 20:04, "Graham Sanderson"  wrote:

Not AFAIK; https://issues.apache.org/jira/browse/CASSANDRA-9472 is marked
as resolved in 3.4, though we are not running it so I can’t say much about
it.


But I was referring to https://issues.apache.org/jira/browse/CASSANDRA-11039
which removed it again in 3.10 and 3.0.10.

--
Alex

It looks like Zing is no longer tied price wise per core which was a show
stopper for us, but it is now priced per server which may affect others
differently.

Note in fact ironically, running 2.1.x with off heap memtables, we had some
of our JVMs running for over a year which made us hit
https://issues.apache.org/jira/browse/CASSANDRA-10969 when we restarted
some nodes for other reasons.

On Nov 26, 2016, at 12:07 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

On Nov 25, 2016 23:47, "Graham Sanderson"  wrote:

If you are seeing 25-30 second GC pauses then (unless you are so badly
configured) seeing full GC under CMS (though G1 may have similar problems).

With CMS eventual fragmentation causing promotion failure is inevitable
(unless you cycle your nodes before it happens). Either your heap has way
too big an old gen, or too small a young gen (but then you need pretty
hefty boxes to be able to run with a large young gen - of the say 4-8G
range) without young collections taking too long.

Depending on your C* version I would highly recommend off heap men-tables.
With those we were able to considerably reduce our heap sizes, despite
having large throughput on a smallish number of nodes.


Aren't offheap memtables discontinued in the most recent releases of 3.0
and 3.x for a good reason? I thought using them could lead to segfaults?

--
Alex

I recommend reading this if you use CMS http://blog.ragozin.info/2
011/10/java-cg-hotspots-cms-and-heap.html, and also not that if you see a
lot of objects of size 131074 in promotion failures then memtables are the
problem - you can try and flush them sooner, but moving them off heap works
better I think.

On Nov 25, 2016, at 4:38 PM, Kant Kodali  wrote:

+1 Chris Lohfink response

I would also restate the following sentence "java GC pauses are pretty much
a fact of life" to "Any GC based system pauses are pretty much a fact of
life".

I would be more than happy to see if someone can counter prove.



On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink  wrote:

> No tuning will eliminate gcs.
>
> 20-30 seconds is horrific and out of the ordinary. Most likely
> implementing antipatterns and/or poorly configured. Sub 1s is realistic but
> with some workloads still may require some tuning to maintain. Some
> workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
> partitions).
>
> Chris
>
> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:
>
>> Hello!
>>
>> From what I understand java GC pauses are pretty much a fact of life, but
>> you can tune the jvm to reduce the likelihood of the frequency and length
>> of GC pauses.
>>
>> When using Cassandra, how frequent or long have these pauses known to
>> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>>
>> Would a 20-30 second pause be something out of the ordinary?
>>
>> Thanks.
>>
>
>


Re: Java GC pauses, reality check

2016-11-26 Thread Graham Sanderson
Not AFAIK; https://issues.apache.org/jira/browse/CASSANDRA-9472 
 is marked as resolved in 
3.4, though we are not running it so I can’t say much about it.

It looks like Zing is no longer tied price wise per core which was a show 
stopper for us, but it is now priced per server which may affect others 
differently.

Note in fact ironically, running 2.1.x with off heap memtables, we had some of 
our JVMs running for over a year which made us hit 
https://issues.apache.org/jira/browse/CASSANDRA-10969 
 when we restarted some 
nodes for other reasons.

> On Nov 26, 2016, at 12:07 AM, Oleksandr Shulgin 
>  wrote:
> 
> On Nov 25, 2016 23:47, "Graham Sanderson"  > wrote:
> If you are seeing 25-30 second GC pauses then (unless you are so badly 
> configured) seeing full GC under CMS (though G1 may have similar problems).
> 
> With CMS eventual fragmentation causing promotion failure is inevitable 
> (unless you cycle your nodes before it happens). Either your heap has way too 
> big an old gen, or too small a young gen (but then you need pretty hefty 
> boxes to be able to run with a large young gen - of the say 4-8G range) 
> without young collections taking too long.
> 
> Depending on your C* version I would highly recommend off heap men-tables. 
> With those we were able to considerably reduce our heap sizes, despite having 
> large throughput on a smallish number of nodes.
> 
> Aren't offheap memtables discontinued in the most recent releases of 3.0 and 
> 3.x for a good reason? I thought using them could lead to segfaults?
> 
> --
> Alex
> 
> I recommend reading this if you use CMS 
> http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html 
> , and 
> also not that if you see a lot of objects of size 131074 in promotion 
> failures then memtables are the problem - you can try and flush them sooner, 
> but moving them off heap works better I think.
> 
>> On Nov 25, 2016, at 4:38 PM, Kant Kodali > > wrote:
>> 
>> +1 Chris Lohfink response
>> 
>> I would also restate the following sentence "java GC pauses are pretty much 
>> a fact of life" to "Any GC based system pauses are pretty much a fact of 
>> life".
>> 
>> I would be more than happy to see if someone can counter prove.
>> 
>> 
>> 
>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink > > wrote:
>> No tuning will eliminate gcs.
>> 
>> 20-30 seconds is horrific and out of the ordinary. Most likely implementing 
>> antipatterns and/or poorly configured. Sub 1s is realistic but with some 
>> workloads still may require some tuning to maintain. Some workloads are very 
>> unfriendly to GCs though (ie heavy tombstones, very wide partitions).
>> 
>> Chris
>> 
>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed > > wrote:
>> Hello!
>> 
>> From what I understand java GC pauses are pretty much a fact of life, but 
>> you can tune the jvm to reduce the likelihood of the frequency and length of 
>> GC pauses.
>> 
>> When using Cassandra, how frequent or long have these pauses known to be?  
>> Even with tuning, is it safe to assume they cannot be eliminated?
>> 
>> Would a 20-30 second pause be something out of the ordinary?
>> 
>> Thanks.
>> 
>> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Does recovery continue after truncating a table?

2016-11-26 Thread Hiroyuki Yamada
Hi Yuji and Ben,

I tried out this revised script and the same issue occurred to me, too.
I think it's definitely a bug to be solved asap.

>Ben
What do you mean "an undocumented limitation" ?

Thanks,
Hiro

On Sat, Nov 26, 2016 at 3:13 PM, Ben Slater  wrote:
> Nice detective work! Seems to me that it’s a best an undocumented limitation
> and potentially could be viewed as a bug - maybe log another JIRA?
>
> One node - there is a nodetool truncatehints command that could be used to
> clear out the hints
> (http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlight=truncate)
> . However, it seems to clear all hints on particular endpoint, not just for
> a specific table.
>
> Cheers
> Ben
>
> On Fri, 25 Nov 2016 at 17:42 Yuji Ito  wrote:
>>
>> Hi all,
>>
>> I revised the script to reproduce the issue.
>> I think the issue happens more frequently than before.
>> Killing another node is added to the previous script.
>>
>>  [script] 
>> #!/bin/sh
>>
>> node1_ip=
>> node2_ip=
>> node3_ip=
>> node2_user=
>> node3_user=
>> rows=1
>>
>> echo "consistency quorum;" > init_data.cql
>> for key in $(seq 0 $(expr $rows - 1))
>> do
>> echo "insert into testdb.testtbl (key, val) values($key, ) IF NOT
>> EXISTS;" >> init_data.cql
>> done
>>
>> while true
>> do
>> echo "truncate the table"
>> cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1
>> if [ $? -ne 0 ]; then
>> echo "truncating failed"
>> continue
>> else
>> break
>> fi
>> done
>>
>> echo "kill C* process on node3"
>> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon |
>> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>>
>> echo "insert $rows rows"
>> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
>>
>> echo "restart C* process on node3"
>> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start"
>>
>> while true
>> do
>> echo "truncate the table again"
>> cqlsh $node1_ip -e "truncate table testdb.testtbl"
>> if [ $? -ne 0 ]; then
>> echo "truncating failed"
>> continue
>> else
>> echo "truncation succeeded!"
>> break
>> fi
>> done
>>
>> echo "kill C* process on node2"
>> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon |
>> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>>
>> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
>> count(*) from testdb.testtbl;"
>> sleep 10
>> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
>> count(*) from testdb.testtbl;"
>>
>> echo "restart C* process on node2"
>> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start"
>>
>>
>> Thanks,
>> yuji
>>
>>
>> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito  wrote:
>>>
>>> I investigated source code and logs of killed node.
>>> I guess that unexpected writes are executed when truncation is being
>>> executed.
>>>
>>> Some writes were executed after flush (the first flush) in truncation and
>>> these writes could be read.
>>> These writes were requested as MUTATION by another node for hinted
>>> handoff.
>>> Their data was stored to a new memtable and flushed (the second flush) to
>>> a new SSTable before snapshot in truncation.
>>> So, the truncation discarded only old SSTables, not the new SSTable.
>>> That's because ReplayPosition which was used for discarding SSTable was
>>> that of the first flush.
>>>
>>> I copied some parts of log as below.
>>> "##" line is my comment.
>>> The point is that the ReplayPosition is moved forward by the second
>>> flush.
>>> It means some writes are executed after the first flush.
>>>
>>> == log ==
>>> ## started truncation
>>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612
>>> ColumnFamilyStore.java:2790 - truncating testtbl
>>> ## the first flush started before truncation
>>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612
>>> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%)
>>> on-heap, 0 (0%) off-heap
>>> INFO  [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 -
>>> Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops,
>>> 0%/0% of on/off-heap limit)
>>> ...
>>> DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 -
>>> Completed flushing
>>> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-1-big-Data.db
>>> (17.651KiB) for commitlog position ReplayPosition(segmentId=1479371760395,
>>> position=315867)
>>> ## this ReplayPosition was used for discarding SSTables
>>> ...
>>> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 -
>>> discard completed log segments for ReplayPosition(segmentId=1479371760395,
>>> position=315867), table 562848f0-a556-11e6-8b14-51065d58fdfb
>>> ## end of the first flush
>>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028
>>> 

Rust Cassandra Driver?

2016-11-26 Thread Jan Algermissen

Hi,

I am looking for a driver for the Rust language. I found some projects 
which seem quite abandoned.


Can someone point me to the driver that makes the most sense to look at 
or help working on?


Cheers,

Jan