Poor Performance of Cassandra UDF/UDA

2017-09-26 Thread Xin Jin
Hi All, I am new to the Cassandra community and thank you in advance for your kindly comments on an issue we met recently. We have found that running query with direct UDF execution is ten time more faster than the async UDF execution. The in-line comment: "Using async UDF execution is

Re: User Defined Compaction Issue

2017-09-26 Thread shalom sagges
Awesome explanation :-) Thanks a lot! On Tue, Sep 26, 2017 at 3:40 PM, Jeff Jirsa wrote: > Write row A, flush into sstable 1 > Delete row A, flush the tombstone into sstable 2 > > The tombstone in sstable 2 can’t be removed until row A in sstable 1 gets > removed. If you just

Re: Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread DuyHai Doan
If you're looking for schema generation from Bean annotations: https://github.com/doanduyhai/Achilles/wiki/DDL-Scripts-Generation On Tue, Sep 26, 2017 at 2:50 PM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > Hi, I also just figured out that there is no schema generation

Re: User Defined Compaction Issue

2017-09-26 Thread Jeff Jirsa
Write row A, flush into sstable 1 Delete row A, flush the tombstone into sstable 2 The tombstone in sstable 2 can’t be removed until row A in sstable 1 gets removed. If you just keep recompacting sstable 2 by itself, the row in sstable A remains on disk. -- Jeff Jirsa > On Sep 26, 2017,

Re: Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread Daniel Hölbling-Inzko
Hi, I also just figured out that there is no schema generation off the mapper. Thanks for pointing me to the secondary index info. I'll have a look. greetings Daniel On Tue, 26 Sep 2017 at 09:42 kurt greaves wrote: > If you've created a secondary index you simply query it

Re: Compaction through put and compaction tasks

2017-09-26 Thread kurt greaves
Number of active tasks is controlled by concurrent_compactors yaml setting. Recommendation is set to number of cpu cores you have. Number of pending tasks is an estimate generated by Cassandra to achieve a completely compacted state (i.e, there are no more possible compactions). How this is

Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Matope Ono
Hi. We met similar situation after upgrading from 2.1.14 to 3.11 in our production. Have you already tried G1GC instead of CMS? Our timeouts were mitigated after replacing CMS with G1GC. Thanks. 2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com>: > Hello, > > >

Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Alexander Dejanovski
Hi Thomas, I wouldn't move to G1GC with small heaps (<24GB) but just looking at your ticket I think that your new gen is way too small. I get that it worked better in 2.1 in your case though, which would suggest that the memory footprint is different between 2.1 and 3.0. It looks like you're

Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread Daniel Hölbling-Inzko
Hi, I am currently moving an application from SQL to Cassandra using Java. I successfully got the DataStax driver and the mapper up and running, but can't seem to figure out how to set secondary indexes through the mapper. I also can't seem to find anything related to indexes in the mapper sources

Re: Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread kurt greaves
If you've created a secondary index you simply query it by specifying it as part of the where clause. Note that you should really understand the drawbacks of secondary indexes before using them, as they might not be incredibly efficient depending on what you need them for.

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas
Hi, in our experience CMS is doing much better with smaller heaps. Regards, Thomas From: Matope Ono [mailto:matope@gmail.com] Sent: Dienstag, 26. September 2017 10:58 To: user@cassandra.apache.org Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18) Hi. We met similar

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas
Hi Alex, we tested with larger new gen sizes up to ¼ of max heap, but m4.xlarge look like being to weak to deal with larger new gen. The result was that we then got much more GCInspector related logs, but perhaps we need to re-test. Right, we are using batches extensively. Unlogged/non-atomic.

Re: Interrogation about expected performance

2017-09-26 Thread kurt greaves
Sounds reasonable. How big are your writes? Also are you seeing a bottleneck? If so, what are the details? If everything is running fine with 200k writes/sec (compactions aren't backing up, not a lot of disk IO) then that's good. However you will also need to compare what you can achieve when you

Re: User Defined Compaction Issue

2017-09-26 Thread shalom sagges
Thanks Jeff! I'll try that. I'm not sure I understand how the tombstones are covering data in another file. Do you have a small example perhaps? Thanks again! On Tue, Sep 26, 2017 at 1:38 AM, Jeff Jirsa wrote: > The problem is likely that your sstables overlap - your 91%

Compaction through put and compaction tasks

2017-09-26 Thread Anshu Vajpayee
Hello - Ihave very generic question regarding compaction. How does cassandra internally generate the number of comapction tasks? How does it get affected with compaction throughput ? If we increase the number of compaction throughput, will the per second compaction task increase for same

nodetool cleanup in parallel

2017-09-26 Thread Peng Xiao
hi, nodetool cleanup will only remove those keys which no longer belong to those nodes,than theoretically we can run nodetool cleanup in parallel,right?the document suggests us to run this one by one,but it's too slow. Thanks, Peng Xiao

?????? nodetool cleanup in parallel

2017-09-26 Thread Peng Xiao
Thanks Kurt. -- -- ??: "kurt";; : 2017??9??27??(??) 11:57 ??: "User"; : Re: nodetool cleanup in parallel correct. you can run it in parallel across many nodes if you have

RE: 回复: nodetool cleanup in parallel

2017-09-26 Thread Steinmaurer, Thomas
Side-note: At least with 2.1 (or even later), be aware that you might run into the following issue: https://issues.apache.org/jira/browse/CASSANDRA-11155 We are doing cron―job based hourly snapshots in production and have tried to also run cleanup after extending a cluster from 6 to 9 nodes.

Re: nodetool cleanup in parallel

2017-09-26 Thread kurt greaves
correct. you can run it in parallel across many nodes if you have capacity. generally see about a 10% CPU increase from cleanups which isn't a big deal if you have the capacity to handle it + the io. on that note on later versions you can specify -j to run multiple cleanup compactions at the