secondary index creation causes C* oom

2018-01-09 Thread Peng Xiao
Dear All,


We met some C* nodes oom during secondary index creation with C* 2.1.18.
As per https://issues.apache.org/jira/browse/CASSANDRA-12796,the flush writer 
will be blocked by index rebuild.but we still have some confusions:
1.not sure if secondary index creation is the same as index rebuild
2.we noticed that the memory table flush looks still working,not the same as 
CASSANDRA-12796 mentioned,but the compactionExecutor pending is increasing.
3.I'm wondering if the block only blocks the specified table which is creating 
secondary index?


Could anyone please explain?





Thanks

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Jeff Jirsa
Longer than that. Years. Check /proc/cpuinfo


-- 
Jeff Jirsa


> On Jan 9, 2018, at 11:19 PM, daemeon reiydelle  wrote:
> 
> Good luck with that. Pcid out since mid 2017 as I recall? 
> 
> 
> Daemeon (Dæmœn) Reiydelle
> USA 1.415.501.0198
> 
> On Jan 9, 2018 10:31 AM, "Dor Laor"  wrote:
> Make sure you pick instances with PCID cpu capability, their TLB overhead 
> flush
> overhead is much smaller
> 
>> On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
>>  wrote:
>> Quick follow up.
>> 
>>  
>> 
>> Others in AWS reporting/seeing something similar, e.g.: 
>> https://twitter.com/BenBromhead/status/950245250504601600
>> 
>>  
>> 
>> So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, 
>> we now also have applied a kernel update at OS/VM level on a single node 
>> (loadtest and not production though), thus more or less double patched now. 
>> Additional CPU impact by OS/VM level kernel patching is more or less 
>> negligible, so looks highly Hypervisor related.
>> 
>>  
>> 
>> Regards,
>> 
>> Thomas
>> 
>>  
>> 
>> From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] 
>> Sent: Freitag, 05. Jänner 2018 12:09
>> To: user@cassandra.apache.org
>> Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>> 
>>  
>> 
>> Hello,
>> 
>>  
>> 
>> has anybody already some experience/results if a patched Linux kernel 
>> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>> 
>>  
>> 
>> In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
>> relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
>> likely correlating with Amazon finished patching the underlying Hypervisor 
>> infrastructure …
>> 
>>  
>> 
>> Anybody else seeing a similar CPU increase?
>> 
>>  
>> 
>> Thanks,
>> 
>> Thomas
>> 
>>  
>> 
>> The contents of this e-mail are intended for the named addressee only. It 
>> contains information that may be confidential. Unless you are the named 
>> addressee or an authorized designee, you may not copy or use it, or disclose 
>> it to anyone else. If you received it in error please notify us immediately 
>> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
>> is a company registered in Linz whose registered office is at 4040 Linz, 
>> Austria, Freistädterstraße 313
>> 
>> The contents of this e-mail are intended for the named addressee only. It 
>> contains information that may be confidential. Unless you are the named 
>> addressee or an authorized designee, you may not copy or use it, or disclose 
>> it to anyone else. If you received it in error please notify us immediately 
>> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
>> is a company registered in Linz whose registered office is at 4040 Linz, 
>> Austria, Freistädterstraße 313
> 
> 


Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Dor Laor
Hard to tell from the first 10 google search results which Intel CPUs
has it so I went to ask my /proc/cpuinfo, turns out my >1 year Dell XPS
laptop has it. AWS's i3 has it too.

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic
movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowprefetch cpuid_fault epb invpcid_single pti intel_pt tpr_shadow vnmi
flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida
arat pln pts hwp hwp_notify hwp_act_window hwp_epp


On Tue, Jan 9, 2018 at 11:19 PM, daemeon reiydelle 
wrote:

> Good luck with that. Pcid out since mid 2017 as I recall?
>
>
> Daemeon (Dæmœn) Reiydelle
> USA 1.415.501.0198 <(415)%20501-0198>
>
> On Jan 9, 2018 10:31 AM, "Dor Laor"  wrote:
>
> Make sure you pick instances with PCID cpu capability, their TLB overhead
> flush
> overhead is much smaller
>
> On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
>> Quick follow up.
>>
>>
>>
>> Others in AWS reporting/seeing something similar, e.g.:
>> https://twitter.com/BenBromhead/status/950245250504601600
>>
>>
>>
>> So, while we have seen an relative CPU increase of ~ 50% since Jan 4,
>> 2018, we now also have applied a kernel update at OS/VM level on a single
>> node (loadtest and not production though), thus more or less double patched
>> now. Additional CPU impact by OS/VM level kernel patching is more or less 
>> negligible,
>> so looks highly Hypervisor related.
>>
>>
>>
>> Regards,
>>
>> Thomas
>>
>>
>>
>> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
>> *Sent:* Freitag, 05. Jänner 2018 12:09
>> *To:* user@cassandra.apache.org
>> *Subject:* Meltdown/Spectre Linux patch - Performance impact on
>> Cassandra?
>>
>>
>>
>> Hello,
>>
>>
>>
>> has anybody already some experience/results if a patched Linux kernel
>> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>>
>>
>>
>> In production, all nodes running in AWS with m4.xlarge, we see up to a
>> 50% relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018,
>> most likely correlating with Amazon finished patching the underlying
>> Hypervisor infrastructure …
>>
>>
>>
>> Anybody else seeing a similar CPU increase?
>>
>>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> 
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> 
>>
>
>
>


Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread daemeon reiydelle
Good luck with that. Pcid out since mid 2017 as I recall?


Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jan 9, 2018 10:31 AM, "Dor Laor"  wrote:

Make sure you pick instances with PCID cpu capability, their TLB overhead
flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Quick follow up.
>
>
>
> Others in AWS reporting/seeing something similar, e.g.:
> https://twitter.com/BenBromhead/status/950245250504601600
>
>
>
> So, while we have seen an relative CPU increase of ~ 50% since Jan 4,
> 2018, we now also have applied a kernel update at OS/VM level on a single
> node (loadtest and not production though), thus more or less double patched
> now. Additional CPU impact by OS/VM level kernel patching is more or less 
> negligible,
> so looks highly Hypervisor related.
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
> *Sent:* Freitag, 05. Jänner 2018 12:09
> *To:* user@cassandra.apache.org
> *Subject:* Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>
>
>
> Hello,
>
>
>
> has anybody already some experience/results if a patched Linux kernel
> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>
>
>
> In production, all nodes running in AWS with m4.xlarge, we see up to a 50%
> relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018,
> most likely correlating with Amazon finished patching the underlying
> Hypervisor infrastructure …
>
>
>
> Anybody else seeing a similar CPU increase?
>
>
>
> Thanks,
>
> Thomas
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> 
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> 
>


Re: Repair fails for unknown reason

2018-01-09 Thread kurt greaves
The parent repair session will be on the node that you kicked off the
repair on. Are the logs above from that node? Can you make it a bit clearer
how many nodes are involved and the corresponding logs from each node?

On 9 January 2018 at 09:49, Hannu Kröger  wrote:

> We have run restarts on the cluster and that doesn’t seem to help at all.
>
> We ran repair separately for each table that seems to go through usually
> but running a repair on a keyspace doesn’t.
>
> Anything anyone?
>
> Hannu
>
>
> On 3 Jan 2018, at 23:24, Hannu Kröger  wrote:
>
> I can certainly try that. No problem there.
>
> However wouldn’t we then get this kind of errors if that was the case:
>
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
>
> ?
>
> Hannu
>
> On 3 Jan 2018, at 20:50, Nandakishore Tokala <
> nandakishore.tok...@gmail.com> wrote:
>
> hi Hannu,
>
> I think some of the repairs are hanging there. please restart all the
> nodes in the  cluster and start the repair
>
>
> Thanks
> Nanda
>
> On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger  wrote:
>
>> Additional notes:
>>
>> 1) If I run the repair just on those tables, it works fine
>> 2) Those tables are empty
>>
>> Hannu
>>
>> > On 3 Jan 2018, at 18:23, Hannu Kröger  wrote:
>> >
>> > Hello,
>> >
>> > Situation is as follows:
>> >
>> > Repair was started on node X on this keyspace with —full —pr. Repair
>> fails on node Y.
>> >
>> > Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m
>> looking at the debug.log. I see following messages related to this repair
>> request:
>> >
>> > ---
>> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530
>> RepairMessageVerbHandler.java:114 - Validating
>> ValidationRequest{gcBefore=1511473932} org.apache.cassandra.repair.me
>> ssages.ValidationRequest@5a17430c
>> > DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531
>> StorageService.java:3321 - Forcing flush on keyspace mykeyspace, CF mytable
>> > DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531
>> ColumnFamilyStore.java:954 - forceFlush requested but everything is clean
>> in mytable
>> > ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268
>> - Failed creating a merkle tree for [repair 
>> #1df000a0-effa-11e7-8361-b7c9edfbfc33
>> on mykeyspace/mytable, [(6917529027641081856,-9223372036854775808]]], /
>> 123.123.123.123 (see log for details)
>> > ---
>> >
>> > then the same about another table and after that which indicates that
>> repair “master” has told to abort basically, right?
>> >
>> > ---
>> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563
>> RepairMessageVerbHandler.java:142 - Got anticompaction request
>> AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33}
>> org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be
>> > ea
>> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563
>> RepairMessageVerbHandler.java:168 - Got error, removing parent repair
>> session
>> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[AntiEntropyStage:1,5,main]
>> > java.lang.RuntimeException: java.lang.RuntimeException: Parent repair
>> session with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>> >at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(
>> RepairMessageVerbHandler.java:171) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at 
>> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at 
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_111]
>> >at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[na:1.8.0_111]
>> >at 
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_111]
>> >at 
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_111]
>> >at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>> >at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
>> > Caused by: java.lang.RuntimeException: Parent repair session with id =
>> 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>> >at org.apache.cassandra.service.ActiveRepairService.getParentRe
>> pairSession(ActiveRepairService.java:409) ~[apache-cassandra-3.11.0.jar:
>> 3.11.0]
>> >at org.apache.cassandra.service.ActiveRepairService.doAntiCompa
>> ction(ActiveRepairService.java:444) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(
>> RepairMessageVerbHandler.java:143) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >... 7 common frames omitted
>> > ---
>> >
>> > But that 

Re: Quick question on TWCS

2018-01-09 Thread Jeff Jirsa
Full repair on TWCS maintains proper bucketing



-- 
Jeff Jirsa


> On Jan 9, 2018, at 5:36 PM, "wxn...@zjqunshuo.com"  
> wrote:
> 
> Hi All,
> If using TWCS, will a full repair trigger major compaction and then compact 
> all the sstable files into big ones no matter the time bucket?
> 
> Thanks,
> -Simon


Quick question on TWCS

2018-01-09 Thread wxn...@zjqunshuo.com
Hi All,
If using TWCS, will a full repair trigger major compaction and then compact all 
the sstable files into big ones no matter the time bucket?

Thanks,
-Simon


Re: 3.0.15 or 3.11.1

2018-01-09 Thread Nate McCall
>
> Can you please provide dome JIRAs for superior fixes and performance
> improvements which are present in 3.11.1 but are missing in 3.0.15.
>
>
For the security conscious, CASSANDRA-11695 allows you to use Cassandra's
authentication and authorization to lock down JMX/nodetool access instead
of relying on per-node configuration.

-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Tony Anecito
Hi All,Has anyone seen any test results for SQL Server? Although I am a 
Cassandra user I do use SQL Server for other companies.
Thanks,-Tony

  From: Dor Laor 
 To: user@cassandra.apache.org 
 Sent: Tuesday, January 9, 2018 10:31 AM
 Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?
   
Make sure you pick instances with PCID cpu capability, their TLB overhead 
flushoverhead is much smaller
On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
 wrote:

Quick follow up. Others in AWS reporting/seeing something similar, 
e.g.:https://twitter.com/ BenBromhead/status/ 950245250504601600 So, while we 
have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we now also have 
applied a kernel update at OS/VM level on a single node (loadtest and not 
production though), thus more or less double patched now. Additional CPU impact 
by OS/VM level kernel patching is more or lessnegligible, so looks highly 
Hypervisor related. Regards,Thomas From: Steinmaurer, Thomas 
[mailto:thomas.steinmaurer@ dynatrace.com]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra? Hello, 
has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively? In 
production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure … Anybody else seeing a similar CPU increase? Thanks,Thomas The 
contents of this e-mail are intended for the named addressee only. It contains 
information that may be confidential. Unless you are the named addressee or an 
authorized designee, you may not copy or use it, or disclose it to anyone else. 
If you received it in error please notify us immediately and then destroy it. 
Dynatrace Austria GmbH (registration number FN 91482h) is a company registered 
in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 
313The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313



   

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Dor Laor
Make sure you pick instances with PCID cpu capability, their TLB overhead
flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Quick follow up.
>
>
>
> Others in AWS reporting/seeing something similar, e.g.:
> https://twitter.com/BenBromhead/status/950245250504601600
>
>
>
> So, while we have seen an relative CPU increase of ~ 50% since Jan 4,
> 2018, we now also have applied a kernel update at OS/VM level on a single
> node (loadtest and not production though), thus more or less double patched
> now. Additional CPU impact by OS/VM level kernel patching is more or less 
> negligible,
> so looks highly Hypervisor related.
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
> *Sent:* Freitag, 05. Jänner 2018 12:09
> *To:* user@cassandra.apache.org
> *Subject:* Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>
>
>
> Hello,
>
>
>
> has anybody already some experience/results if a patched Linux kernel
> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>
>
>
> In production, all nodes running in AWS with m4.xlarge, we see up to a 50%
> relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018,
> most likely correlating with Amazon finished patching the underlying
> Hypervisor infrastructure …
>
>
>
> Anybody else seeing a similar CPU increase?
>
>
>
> Thanks,
>
> Thomas
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> 
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> 
>


Re: Reducing the replication factor

2018-01-09 Thread Jeff Jirsa
Run repair first to ensure the data is properly replicated, then cleanup.


-- 
Jeff Jirsa


> On Jan 9, 2018, at 9:36 AM, Alessandro Pieri  wrote:
> 
> Dear Everyone,
> 
> We are running Cassandra v2.0.15 on our production cluster.
> 
> We would like to reduce the replication factor from 3 to 2 but we are not 
> sure if it is a safe operation. We would like to get some feedback from you 
> guys. 
> 
> Have anybody tried to shrink the replication factor?
> 
> Does "nodetool cleanup" get rid of the replicated data no longer needed?
> 
> Thanks in advance for your support.
> 
> Regards,
> Alessandro
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Reducing the replication factor

2018-01-09 Thread Alessandro Pieri
Dear Everyone,

We are running Cassandra v2.0.15 on our production cluster.

We would like to reduce the replication factor from 3 to 2 but we are not
sure if it is a safe operation. We would like to get some feedback from you
guys.

Have anybody tried to shrink the replication factor?

Does "nodetool cleanup" get rid of the replicated data no longer needed?

Thanks in advance for your support.

Regards,
Alessandro


Re: 3.0.15 or 3.11.1

2018-01-09 Thread shalom sagges
Thanks a lot for the info!
Much appreciated.

On Tue, Jan 9, 2018 at 2:33 AM, Mick Semb Wever 
wrote:

>
>
>> Can you please provide dome JIRAs for superior fixes and performance
>> improvements which are present in 3.11.1 but are missing in 3.0.15.
>>
>
>
> Some that come to mind…
>
> Cassandra Storage Engine: CASSANDRA-12269, CASSANDRA-12731
>
> Streaming and Compaction: CASSANDRA-11206, CASSANDRA-
> 9766, CASSANDRA-11623,
>
> Reintroduce off heap memtables –  CASSANDRA-9472
>
>


Re: Full repair caused disk space increase issue

2018-01-09 Thread Jon Haddad
The old files will not be split.  TWCS doesn’t ever do that.  

> On Jan 9, 2018, at 12:26 AM, wxn...@zjqunshuo.com wrote:
> 
> Hi Alex,
> After I changed one node to TWCS using JMX command, it started to compact. I 
> expect the old large sstable files will be split into smaller ones according 
> to the time bucket. But I got still large sstable file.
> 
> JMX command used:
> set CompactionParametersJson 
> {"class":"com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy","compaction_window_unit":"DAYS","compaction_window_size":"8"}
> 
> Logs:
> INFO  [CompactionExecutor:4] 2018-01-09 15:55:04,525 
> CompactionManager.java:654 - Will not compact 
> /mnt/hadoop/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/lb-37-big:
>  it is not an active sstable
> INFO  [CompactionExecutor:4] 2018-01-09 15:55:04,525 
> CompactionManager.java:664 - No files to compact for user defined compaction
> 
> The last log means something?
> 
> Cheers,
> -Simon
>  
> From: wxn...@zjqunshuo.com 
> Date: 2018-01-05 15:54
> To: user 
> Subject: Re: Full repair caused disk space increase issue
> Thanks Alex. Some nodes have finished anticompaction and disk space got 
> reclaimed as you mentioned. 
> BTW, after reading your 
> post(http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 
> ) on TWCS, I 
> decided to use TWCS, and doing the full repair is one of the preparation of 
> changing to TWCS.
>  
> From: Alexander Dejanovski 
> Date: 2018-01-05 15:17
> To: user 
> Subject: Re: Full repair caused disk space increase issue
> Hi Simon,
> 
> since Cassandra 2.2, anticompaction is performed in all types of repairs, 
> except subrange repair.
> Given that you have some very big SSTables, the temporary space used by 
> anticompaction (which does the opposite of compaction : read one sstable, 
> output two sstables) will impact your disk usage while it's running. It will 
> reach a peak when they are close to completion.
> The anticompaction that is reported by compactionstats is currently using an 
> extra 147GB*[compression ratio]. So with a compression ratio of 0.3 for 
> example, that would be 44GB that will get reclaimed shortly after the 
> anticompaction is over.
> 
> You can check the current overhead of compaction by listing temporary 
> sstables : *tmp*Data.db
> 
> It's also possible that you have some overstreaming that occurred during your 
> repair, which will increase the size on disk until it gets compacted away 
> (over time).
> You should also check if you don't have snapshots sticking around by running 
> "nodetool listsnapshots".
> 
> Now, you're mentioning that you ran repair to evict tombstones. This is not 
> what repair does, and tombstones are evicted through compaction when they 
> meet the requirements (gc_grace_seconds and all the cells of the partition 
> involved in the same compaction).
> If you want to optimize your tombstone eviction, especially with STCS, I 
> advise to turn on unchecked_tombstone_compaction, which will allow single 
> sstables compactions to be triggered by Cassandra when there is more than 20% 
> of estimated droppable tombstones in an SSTable.
> You can check your current droppable tombstone ratio by running 
> sstablemetadata on all your sstables.
> A command like the following should do the trick (it will print out min/max 
> timestamps too) : 
> 
> for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date 
> --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) 
> '+%m/%d/%Y') "Min:" $(date --date=@$(echo "$meta" | grep Minimum\ time | cut 
> -d" "  -f3| cut -c 1-10) '+%m/%d/%Y') $(echo "$meta" | grep droppable) ' \t ' 
> $(ls -lh $f | awk '{print $5" "$6" "$7" "$8" "$9}'); done | sort
> 
> Check if the 20% threshold is high enough by verifying that newly created 
> SSTables don't already reach that level, and adjust accordingly if it's the 
> case (for example raise the threshold to 50%).
> 
> To activate the tombstone compactions, with a 50% droppable tombstone 
> threshold, perform the following statement on your table : 
> 
> ALTER TABLE cargts.eventdata WITH compaction = 
> {'class':'SizeTieredCompactionStrategy', 
> 'unchecked_tombstone_compaction':'true', 'tombstone_threshold':'0.5'}
> 
> Picking the right threshold is up to you.
> Note that tombstone compactions running more often will use temporary space 
> as well, but they should help evicting tombstones faster if the partitions 
> are contained within a single SSTable.
> 
> If you are dealing with TTLed data and your partitions spread over time, I'd 
> strongly suggest considering TWCS instead of STCS which can remove fully 
> expired SSTables much more efficiently.
> 
> Cheers,
> 
> 
> On Fri, Jan 5, 2018 at 7:43 AM wxn...@zjqunshuo.com 
> 

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Steinmaurer, Thomas
Quick follow up.

Others in AWS reporting/seeing something similar, e.g.: 
https://twitter.com/BenBromhead/status/950245250504601600

So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we 
now also have applied a kernel update at OS/VM level on a single node (loadtest 
and not production though), thus more or less double patched now. Additional 
CPU impact by OS/VM level kernel patching is more or less negligible, so looks 
highly Hypervisor related.

Regards,
Thomas

From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure ...

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Repair fails for unknown reason

2018-01-09 Thread Hannu Kröger
We have run restarts on the cluster and that doesn’t seem to help at all.

We ran repair separately for each table that seems to go through usually but 
running a repair on a keyspace doesn’t. 

Anything anyone?

Hannu

> On 3 Jan 2018, at 23:24, Hannu Kröger  wrote:
> 
> I can certainly try that. No problem there.
> 
> However wouldn’t we then get this kind of errors if that was the case:
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
> ?
> 
> Hannu
> 
>> On 3 Jan 2018, at 20:50, Nandakishore Tokala > > wrote:
>> 
>> hi Hannu,
>> 
>> I think some of the repairs are hanging there. please restart all the nodes 
>> in the  cluster and start the repair 
>> 
>> 
>> Thanks
>> Nanda
>> 
>> On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger > > wrote:
>> Additional notes:
>> 
>> 1) If I run the repair just on those tables, it works fine
>> 2) Those tables are empty
>> 
>> Hannu
>> 
>> > On 3 Jan 2018, at 18:23, Hannu Kröger > > > wrote:
>> >
>> > Hello,
>> >
>> > Situation is as follows:
>> >
>> > Repair was started on node X on this keyspace with —full —pr. Repair fails 
>> > on node Y.
>> >
>> > Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m 
>> > looking at the debug.log. I see following messages related to this repair 
>> > request:
>> >
>> > ---
>> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 
>> > RepairMessageVerbHandler.java:114 - Validating 
>> > ValidationRequest{gcBefore=1511473932} 
>> > org.apache.cassandra.repair.messages.ValidationRequest@5a17430c
>> > DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 
>> > StorageService.java:3321 - Forcing flush on keyspace mykeyspace, CF mytable
>> > DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 
>> > ColumnFamilyStore.java:954 - forceFlush requested but everything is clean 
>> > in mytable
>> > ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 - 
>> > Failed creating a merkle tree for [repair 
>> > #1df000a0-effa-11e7-8361-b7c9edfbfc33 on mykeyspace/mytable, 
>> > [(6917529027641081856,-9223372036854775808]]], /123.123.123.123 
>> >  (see log for details)
>> > ---
>> >
>> > then the same about another table and after that which indicates that 
>> > repair “master” has told to abort basically, right?
>> >
>> > ---
>> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
>> > RepairMessageVerbHandler.java:142 - Got anticompaction request 
>> > AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33}
>> >  org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be
>> > ea
>> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
>> > RepairMessageVerbHandler.java:168 - Got error, removing parent repair 
>> > session
>> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 
>> > CassandraDaemon.java:228 - Exception in thread 
>> > Thread[AntiEntropyStage:1,5,main]
>> > java.lang.RuntimeException: java.lang.RuntimeException: Parent repair 
>> > session with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>> >at 
>> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at org.apache.cassandra.net 
>> > .MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at 
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
>> > ~[na:1.8.0_111]
>> >at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
>> > ~[na:1.8.0_111]
>> >at 
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >  ~[na:1.8.0_111]
>> >at 
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >  [na:1.8.0_111]
>> >at 
>> > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> >  [apache-cassandra-3.11.0.jar:3.11.0]
>> >at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
>> > Caused by: java.lang.RuntimeException: Parent repair session with id = 
>> > 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>> >at 
>> > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:409)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at 
>> > org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:444)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at 
>> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >... 7 common frames omitted
>> > ---
>> >
>> > But that 

Re: Full repair caused disk space increase issue

2018-01-09 Thread wxn...@zjqunshuo.com
Hi Alex,
After I changed one node to TWCS using JMX command, it started to compact. I 
expect the old large sstable files will be split into smaller ones according to 
the time bucket. But I got still large sstable file.

JMX command used:
set CompactionParametersJson 
{"class":"com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy","compaction_window_unit":"DAYS","compaction_window_size":"8"}

Logs:
INFO  [CompactionExecutor:4] 2018-01-09 15:55:04,525 CompactionManager.java:654 
- Will not compact 
/mnt/hadoop/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/lb-37-big:
 it is not an active sstable
INFO  [CompactionExecutor:4] 2018-01-09 15:55:04,525 CompactionManager.java:664 
- No files to compact for user defined compaction

The last log means something?

Cheers,
-Simon
 
From: wxn...@zjqunshuo.com
Date: 2018-01-05 15:54
To: user
Subject: Re: Full repair caused disk space increase issue
Thanks Alex. Some nodes have finished anticompaction and disk space got 
reclaimed as you mentioned. 
BTW, after reading your 
post(http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html) on TWCS, I 
decided to use TWCS, and doing the full repair is one of the preparation of 
changing to TWCS.
 
From: Alexander Dejanovski
Date: 2018-01-05 15:17
To: user
Subject: Re: Full repair caused disk space increase issue
Hi Simon,

since Cassandra 2.2, anticompaction is performed in all types of repairs, 
except subrange repair.
Given that you have some very big SSTables, the temporary space used by 
anticompaction (which does the opposite of compaction : read one sstable, 
output two sstables) will impact your disk usage while it's running. It will 
reach a peak when they are close to completion.
The anticompaction that is reported by compactionstats is currently using an 
extra 147GB*[compression ratio]. So with a compression ratio of 0.3 for 
example, that would be 44GB that will get reclaimed shortly after the 
anticompaction is over.

You can check the current overhead of compaction by listing temporary sstables 
: *tmp*Data.db

It's also possible that you have some overstreaming that occurred during your 
repair, which will increase the size on disk until it gets compacted away (over 
time).
You should also check if you don't have snapshots sticking around by running 
"nodetool listsnapshots".

Now, you're mentioning that you ran repair to evict tombstones. This is not 
what repair does, and tombstones are evicted through compaction when they meet 
the requirements (gc_grace_seconds and all the cells of the partition involved 
in the same compaction).
If you want to optimize your tombstone eviction, especially with STCS, I advise 
to turn on unchecked_tombstone_compaction, which will allow single sstables 
compactions to be triggered by Cassandra when there is more than 20% of 
estimated droppable tombstones in an SSTable.
You can check your current droppable tombstone ratio by running sstablemetadata 
on all your sstables.
A command like the following should do the trick (it will print out min/max 
timestamps too) : 

for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date 
--date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) 
'+%m/%d/%Y') "Min:" $(date --date=@$(echo "$meta" | grep Minimum\ time | cut 
-d" "  -f3| cut -c 1-10) '+%m/%d/%Y') $(echo "$meta" | grep droppable) ' \t ' 
$(ls -lh $f | awk '{print $5" "$6" "$7" "$8" "$9}'); done | sort

Check if the 20% threshold is high enough by verifying that newly created 
SSTables don't already reach that level, and adjust accordingly if it's the 
case (for example raise the threshold to 50%).

To activate the tombstone compactions, with a 50% droppable tombstone 
threshold, perform the following statement on your table : 

ALTER TABLE cargts.eventdata WITH compaction = 
{'class':'SizeTieredCompactionStrategy', 
'unchecked_tombstone_compaction':'true', 'tombstone_threshold':'0.5'}

Picking the right threshold is up to you.
Note that tombstone compactions running more often will use temporary space as 
well, but they should help evicting tombstones faster if the partitions are 
contained within a single SSTable.

If you are dealing with TTLed data and your partitions spread over time, I'd 
strongly suggest considering TWCS instead of STCS which can remove fully 
expired SSTables much more efficiently.

Cheers,


On Fri, Jan 5, 2018 at 7:43 AM wxn...@zjqunshuo.com  
wrote:
Hi All,
In order to evict tombstones, I issued full repair with the command "nodetool 
-pr -full". Then the data load size was indeed decreased by 100G for each node 
by using "nodetool status" to check. But the actual disk usage increased by 
500G for each node. The repair is still ongoing and leaving less and less disk 
space for me.

From compactionstats, I see "Anticompaction after repair". Based on my 
understanding, it is for incremental repair by changing sstable metadata to 
indicate which file is repaired,