How can I scale my read rate?

2017-03-11 Thread S G
Hi,

I have a 9 node cassandra cluster, each node has:
RAM: 5.5gb
Disk: 64gb
C* version: 3.3
Java: 1.8.0_51

The cluster stores about 2 million rows and partition keys for all these
rows are unique.

I am stress-reading it from a 12-node client cluster.
Each read-client has 50 threads, so total 600 read-threads from 12 machines.
Each read query is get-by-primary-key where keys are read randomly from a
file having all primary-keys.


I am able to get only 15,000 reads/second from the entire system.
Is this the best read performance I can expect?
Are there any benchmarks I can compare against?


I tried setting bloom_filter_fp_chance from 0.01 to 0.0001 and caching to
{'keys': 'ALL', 'rows_per_partition': 'ALL'} but it had no effect.

Thx,
SG


Re: State of triggers

2017-03-03 Thread S G
Does Cassandra itself use triggers internally for something?
That would make a pretty good case for triggers being ready for production
use.

Otherwise, it would tend to be a neglected feature because active
developers would have no good reason to add features to it other than just
make the test suite pass.

On Fri, Mar 3, 2017 at 9:04 AM, Jeff Jirsa  wrote:

> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo 
> wrote:
>
> >
> > I used them. I built do it yourself secondary indexes with them. They
> have
> > there gotchas, but so do all the secondary index implementations. Just
> > because datastax does not write about something. Lets see like 5 years
> ago
> > there was this: https://github.com/hmsonline/cassandra-triggers
> >
> >
> Still in use? How'd it work? Production ready? Would you still do it that
> way in 2017?
>
>
> > There is a fairly large divergence to what actual users do and what other
> > groups 'say' actual users do in some cases.
> >
>
> A lot of people don't share what they're doing (for business reasons, or
> because they don't think it's important, or because they don't know
> how/where), and that's fine but it makes it hard for anyone to know what
> features are used, or how well they're really working in production.
>
> I've seen a handful of "how do we use triggers" questions in IRC, and they
> weren't unreasonable questions, but seemed like a lot of pain, and more
> than one of those people ultimately came back and said they used some other
> mechanism (and of course, some of them silently disappear, so we have no
> idea if it worked or not).
>
> If anyone's actively using triggers, please don't keep it a secret. Knowing
> that they're being used would be a great way to justify continuing to
> maintain them.
>
> - Jeff
>


State of triggers

2017-03-02 Thread S G
Hi,

I am not able to find any documentation on the current state of triggers
being production ready.

The post at
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support
says that "The current implementation is experimental, and there is some
work to do before triggers in Cassandra can be declared final and
production-ready."

So which version of Cassandra should we expect triggers to be stable enough?
Our requirement is to develop a solution for several Cassandra users all
running on different versions (they won't upgrade easily) and no one is
using 3.5+ versions.
So the smallest Cassandra version which has production ready triggers would
be really good to know.

Also any advice on common gotchas with Cassandra triggers would be great to
know.

Thanks
SG


Re: How to read CDC from Cassandra?

2017-02-16 Thread S G
Hey Jay,

Thanks for the pointer.
I have spent quite some time in trying to understand this, even went
through a good deal of
https://github.com/apache/cassandra/commit/e31e216234c6b57a531cae607e0355666007deb2,
but I am not able to understand how this whole thing works.


*Can someone please correct my understanding till now (stated below)?*

1) Cassandra would only write to the CDC log, and never delete from it.
2) Cleaning up consumed logfiles would be the client daemon's responibility.
3) Daemons should be able to checkpoint their work, and resume from where
they left off.
   This means they would have to leave some file artifact in the CDC log's
directory.
4) Upon flush, CommitLogSegments containing data for CDC-enabled tables are
moved to the data/cdc_raw directory until removed by the user


*Questions:*

1) What is exactly written to the commit log? Is it just the id or the
whole of the object?
2) If its just the IDs of the inserted/modified row, then is the client
expected to read the whole object from the ID?
3) If its the entire payload, how does the client deserialize the payload
to the the full row?
4) What about partial updates? Some clients cannot work on partial updates
and will need to read the full object. Any recommendations for those?
5) What is the best way to try out the whole flow? Is it the following:
 - a) Setup cassandra.yaml for cdc and create  tables with cdc=true
 - b) Write some data to the table and see the files being generated in the
cdc_raw_directory
 - c) Launch an agent similar to CASSANDRA-11575. Consume and delete the
cdc files?

Thanks for your help,
SG



On Wed, Feb 15, 2017 at 3:19 PM, Jay Zhuang <jay.zhu...@yahoo.com.invalid>
wrote:

> I tried this CASSANDRA-11575 for 3.8. Works great.
>
> Thanks,
> Jay
>
>
> On 2/15/17 3:08 PM, S G wrote:
>
>> Hi,
>>
>> I have gone through several resources mentioned in
>> http://cassandra.apache.org/doc/latest/operating/cdc.html
>>
>> The only thing mentioned about reading the CDC is that it is fairly
>> straightforward with a link to
>> https://github.com/apache/cassandra/blob/e31e216234c6b57a531
>> cae607e0355666007deb2/src/java/org/apache/cassandra/db/
>> commitlog/CommitLogReplayer.java#L132-L140
>>
>> This is way too high level.
>>
>> Can someone please explain or provide me the code to read CDC data after
>> enabling this feature in Cassandra?
>>
>>
>> Thanks
>>
>> SG
>>
>>


How to read CDC from Cassandra?

2017-02-15 Thread S G
Hi,

I have gone through several resources mentioned in
http://cassandra.apache.org/doc/latest/operating/cdc.html

The only thing mentioned about reading the CDC is that it is fairly
straightforward with a link to
https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140

This is way too high level.

Can someone please explain or provide me the code to read CDC data after
enabling this feature in Cassandra?


Thanks

SG