Re: How can I scale my read rate?

2017-03-27 Thread Alexander Dejanovski
By default the TokenAwarePolicy does shuffle replicas, and it can be
disabled if you want to only hit the primary replica for the token range
you're querying :
http://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/policies/TokenAwarePolicy.html

On Mon, Mar 27, 2017 at 9:41 AM Avi Kivity  wrote:

> Is the driver doing the right thing by directing all reads for a given
> token to the same node?  If that node fails, then all of those reads will
> be directed at other nodes, all oh whom will be cache-cold for the the
> failed node's primary token range.  Seems like the driver should distribute
> reads among the all the replicas for a token, at least as an option, to
> keep the caches warm for latency-sensitive loads.
>
> On 03/26/2017 07:46 PM, Eric Stevens wrote:
>
> Yes, throughput for a given partition key cannot be improved with
> horizontal scaling.  You can increase RF to theoretically improve
> throughput on that key, but actually in this case smart clients might hold
> you back, because they're probably token aware, and will try to serve that
> read off the key's primary replica, so all reads would be directed at a
> single node for that key.
>
> If you're reading at CL=QUORUM, there's a chance that increasing RF will
> actually reduce performance rather than improve it, because you've
> increased the total amount of work to serve the read (as well as the
> write).  If you're reading at CL=ONE, increasing RF will increase the
> chances of falling afoul of eventual consistency.
>
> However that's not really a real-world scenario.  Or if it is, Cassandra
> is probably the wrong tool to satisfy that kind of workload.
>
> On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul 
> wrote:
>
> On 24/03/2017 01:00, Eric Stevens wrote:
> > Assuming an even distribution of data in your cluster, and an even
> > distribution across those keys by your readers, you would not need to
> > increase RF with cluster size to increase read performance.  If you have
> > 3 nodes with RF=3, and do 3 million reads, with good distribution, each
> > node has served 1 million read requests.  If you increase to 6 nodes and
> > keep RF=3, then each node now owns half as much data and serves only
> > 500,000 reads.  Or more meaningfully in the same time it takes to do 3
> > million reads under the 3 node cluster you ought to be able to do 6
> > million reads under the 6 node cluster since each node is just
> > responsible for 1 million total reads.
> >
> Hi Eric,
>
> I think I got your point.
> In case of really evenly distributed  reads it may (or should?) not make
> any difference,
>
> But when you do not distribute well the reads (and in that case only),
> my understanding about RF was that it could help spreading the load :
> In that case, with RF= 4 instead of 3,  with several clients accessing keys
> same key ranges, a coordinator could pick up one node to handle the request
> in 4 replicas instead of picking up one node in 3 , thus having
> more "workers" to handle a request ?
>
> Am I wrong here ?
>
> Thank you for the clarification
>
>
> --
> best,
> Alain
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: How can I scale my read rate?

2017-03-27 Thread Avi Kivity
Is the driver doing the right thing by directing all reads for a given 
token to the same node?  If that node fails, then all of those reads 
will be directed at other nodes, all oh whom will be cache-cold for the 
the failed node's primary token range.  Seems like the driver should 
distribute reads among the all the replicas for a token, at least as an 
option, to keep the caches warm for latency-sensitive loads.



On 03/26/2017 07:46 PM, Eric Stevens wrote:
Yes, throughput for a given partition key cannot be improved with 
horizontal scaling.  You can increase RF to theoretically improve 
throughput on that key, but actually in this case smart clients might 
hold you back, because they're probably token aware, and will try to 
serve that read off the key's primary replica, so all reads would be 
directed at a single node for that key.


If you're reading at CL=QUORUM, there's a chance that increasing RF 
will actually reduce performance rather than improve it, because 
you've increased the total amount of work to serve the read (as well 
as the write).  If you're reading at CL=ONE, increasing RF will 
increase the chances of falling afoul of eventual consistency.


However that's not really a real-world scenario.  Or if it is, 
Cassandra is probably the wrong tool to satisfy that kind of workload.


On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul > wrote:


On 24/03/2017 01:00, Eric Stevens wrote:
> Assuming an even distribution of data in your cluster, and an even
> distribution across those keys by your readers, you would not
need to
> increase RF with cluster size to increase read performance.  If
you have
> 3 nodes with RF=3, and do 3 million reads, with good
distribution, each
> node has served 1 million read requests.  If you increase to 6
nodes and
> keep RF=3, then each node now owns half as much data and serves only
> 500,000 reads.  Or more meaningfully in the same time it takes
to do 3
> million reads under the 3 node cluster you ought to be able to do 6
> million reads under the 6 node cluster since each node is just
> responsible for 1 million total reads.
>
Hi Eric,

I think I got your point.
In case of really evenly distributed  reads it may (or should?)
not make
any difference,

But when you do not distribute well the reads (and in that case only),
my understanding about RF was that it could help spreading the load :
In that case, with RF= 4 instead of 3,  with several clients
accessing keys
same key ranges, a coordinator could pick up one node to handle
the request
in 4 replicas instead of picking up one node in 3 , thus having
more "workers" to handle a request ?

Am I wrong here ?

Thank you for the clarification


--
best,
Alain





Re: How can I scale my read rate?

2017-03-26 Thread Anthony Grasso
Keep in mind there are side effects to increasing to RF = 4

   - Storage requirements for each node will increase. Depending on the
   number of nodes in the cluster and the size of the data this could be
   significant.
   - Whilst the number of available coordinators increases, the number of
   nodes involved in QUORUM reads/writes will increase from 2 to 3.



On 24 March 2017 at 16:43, Alain Rastoul  wrote:

> On 24/03/2017 01:00, Eric Stevens wrote:
>
>> Assuming an even distribution of data in your cluster, and an even
>> distribution across those keys by your readers, you would not need to
>> increase RF with cluster size to increase read performance.  If you have
>> 3 nodes with RF=3, and do 3 million reads, with good distribution, each
>> node has served 1 million read requests.  If you increase to 6 nodes and
>> keep RF=3, then each node now owns half as much data and serves only
>> 500,000 reads.  Or more meaningfully in the same time it takes to do 3
>> million reads under the 3 node cluster you ought to be able to do 6
>> million reads under the 6 node cluster since each node is just
>> responsible for 1 million total reads.
>>
>> Hi Eric,
>
> I think I got your point.
> In case of really evenly distributed  reads it may (or should?) not make
> any difference,
>
> But when you do not distribute well the reads (and in that case only),
> my understanding about RF was that it could help spreading the load :
> In that case, with RF= 4 instead of 3,  with several clients accessing keys
> same key ranges, a coordinator could pick up one node to handle the request
> in 4 replicas instead of picking up one node in 3 , thus having
> more "workers" to handle a request ?
>
> Am I wrong here ?
>
> Thank you for the clarification
>
>
> --
> best,
> Alain
>
>


Re: How can I scale my read rate?

2017-03-26 Thread Eric Stevens
Yes, throughput for a given partition key cannot be improved with
horizontal scaling.  You can increase RF to theoretically improve
throughput on that key, but actually in this case smart clients might hold
you back, because they're probably token aware, and will try to serve that
read off the key's primary replica, so all reads would be directed at a
single node for that key.

If you're reading at CL=QUORUM, there's a chance that increasing RF will
actually reduce performance rather than improve it, because you've
increased the total amount of work to serve the read (as well as the
write).  If you're reading at CL=ONE, increasing RF will increase the
chances of falling afoul of eventual consistency.

However that's not really a real-world scenario.  Or if it is, Cassandra is
probably the wrong tool to satisfy that kind of workload.

On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul 
wrote:

On 24/03/2017 01:00, Eric Stevens wrote:
> Assuming an even distribution of data in your cluster, and an even
> distribution across those keys by your readers, you would not need to
> increase RF with cluster size to increase read performance.  If you have
> 3 nodes with RF=3, and do 3 million reads, with good distribution, each
> node has served 1 million read requests.  If you increase to 6 nodes and
> keep RF=3, then each node now owns half as much data and serves only
> 500,000 reads.  Or more meaningfully in the same time it takes to do 3
> million reads under the 3 node cluster you ought to be able to do 6
> million reads under the 6 node cluster since each node is just
> responsible for 1 million total reads.
>
Hi Eric,

I think I got your point.
In case of really evenly distributed  reads it may (or should?) not make
any difference,

But when you do not distribute well the reads (and in that case only),
my understanding about RF was that it could help spreading the load :
In that case, with RF= 4 instead of 3,  with several clients accessing keys
same key ranges, a coordinator could pick up one node to handle the request
in 4 replicas instead of picking up one node in 3 , thus having
more "workers" to handle a request ?

Am I wrong here ?

Thank you for the clarification


--
best,
Alain


Re: How can I scale my read rate?

2017-03-23 Thread Alain Rastoul

On 24/03/2017 01:00, Eric Stevens wrote:

Assuming an even distribution of data in your cluster, and an even
distribution across those keys by your readers, you would not need to
increase RF with cluster size to increase read performance.  If you have
3 nodes with RF=3, and do 3 million reads, with good distribution, each
node has served 1 million read requests.  If you increase to 6 nodes and
keep RF=3, then each node now owns half as much data and serves only
500,000 reads.  Or more meaningfully in the same time it takes to do 3
million reads under the 3 node cluster you ought to be able to do 6
million reads under the 6 node cluster since each node is just
responsible for 1 million total reads.


Hi Eric,

I think I got your point.
In case of really evenly distributed  reads it may (or should?) not make 
any difference,


But when you do not distribute well the reads (and in that case only),
my understanding about RF was that it could help spreading the load :
In that case, with RF= 4 instead of 3,  with several clients accessing keys
same key ranges, a coordinator could pick up one node to handle the request
in 4 replicas instead of picking up one node in 3 , thus having
more "workers" to handle a request ?

Am I wrong here ?

Thank you for the clarification


--
best,
Alain



Re: How can I scale my read rate?

2017-03-23 Thread Eric Stevens
Assuming an even distribution of data in your cluster, and an even
distribution across those keys by your readers, you would not need to
increase RF with cluster size to increase read performance.  If you have 3
nodes with RF=3, and do 3 million reads, with good distribution, each node
has served 1 million read requests.  If you increase to 6 nodes and keep
RF=3, then each node now owns half as much data and serves only 500,000
reads.  Or more meaningfully in the same time it takes to do 3 million
reads under the 3 node cluster you ought to be able to do 6 million reads
under the 6 node cluster since each node is just responsible for 1 million
total reads.

On Mon, Mar 20, 2017 at 11:24 PM Alain Rastoul 
wrote:

> On 20/03/2017 22:05, Michael Wojcikiewicz wrote:
> > Not sure if someone has suggested this, but I believe it's not
> > sufficient to simply add nodes to a cluster to increase read
> > performance: you also need to alter the ReplicationFactor of the
> > keyspace to a larger value as you increase your cluster gets larger.
> >
> > ie. data is available from more nodes in the cluster for each query.
> >
> Yes, good point in case of cluster growth, there would be more replica
> to handle same key ranges.
> And also readjust token ranges :
> https://cassandra.apache.org/doc/latest/operating/topo_changes.html
>
> SG, can you give some information (or share your code) about how you
> generate your data and how you read it ?
>
> --
> best,
> Alain
>
>


Re: How can I scale my read rate?

2017-03-20 Thread Alain Rastoul

On 20/03/2017 22:05, Michael Wojcikiewicz wrote:

Not sure if someone has suggested this, but I believe it's not
sufficient to simply add nodes to a cluster to increase read
performance: you also need to alter the ReplicationFactor of the
keyspace to a larger value as you increase your cluster gets larger.

ie. data is available from more nodes in the cluster for each query.

Yes, good point in case of cluster growth, there would be more replica 
to handle same key ranges.

And also readjust token ranges :
https://cassandra.apache.org/doc/latest/operating/topo_changes.html

SG, can you give some information (or share your code) about how you 
generate your data and how you read it ?


--
best,
Alain



Re: How can I scale my read rate?

2017-03-20 Thread Alain Rastoul

On 20/03/2017 02:35, S G wrote:

2)
https://docs.datastax.com/en/developer/java-driver/3.1/manual/statements/prepared/
tells me to avoid preparing select queries if I expect a change of
columns in my table down the road.
The problem is also related to select * which is considered bad practice 
with most databases...



I did some more testing to see if my client machines were the bottleneck.
For a 6-node Cassandra cluster (each VM having 8-cores), I got 26,000
reads/sec for all of the following:
1) Client nodes:1, Threads: 60
2) Client nodes:3, Threads: 180
3) Client nodes:5, Threads: 300
4) Client nodes:10, Threads: 600
5) Client nodes:20, Threads: 1200

So adding more client nodes or threads to those client nodes is not
having any effect.
I am suspecting Cassandra is simply not allowing me to go any further.

> Primary keys for my schema are:
>  PRIMARY KEY((name, phone), age)
> name: text
> phone: int
> age: int

Yes with such a PK data must be spread on the whole cluster (also taking 
into account the partitioner), strange that the throughput doesn't scale.

I guess you also have verified that you select data randomly?

May be you could have a look at the system traces to see the query plan 
for some requests:
If you are on a test cluster you can truncate the tables before 
(truncate system_traces.sessions; and truncate system_traces.events;), 
run a test then select * from system_traces.events

where session_id = 
xxx being one of the sessions you pick in trace.sessions.

Try to see if you are not always hitting the same nodes.


--
best,
Alain



Re: How can I scale my read rate?

2017-03-19 Thread James Carman
Have you tried using PreparedStatements?

On Sat, Mar 18, 2017 at 9:47 PM S G <sg.online.em...@gmail.com> wrote:

> ok, I gave the executeAsync() a try.
> Good part is that it was really easy to write the code for that.
> Bad part is that it did not had a huge effect on my throughput - I gained
> about 5% increase in throughput.
> I suspect it is so because my queries are all get-by-primary-key queries
> and were anyways completing in less than 2 milliseconds.
> So there was not much wait to begin with.
>
>
> Here is my code:
>
> String getByKeyQueryStr = "Select * from fooTable where key = " + key;
> //ResultSet result = session.execute(getByKeyQueryStr);  // Previous code
> ResultSetFuture future = session.executeAsync(getByKeyQueryStr);
> FutureCallback callback = new MyFutureCallback();
> executor = MoreExecutors.sameThreadExecutor();
> //executor = Executors.newFixedThreadPool(3); // Tried this too, no effect
> //executor = Executors.newFixedThreadPool(10); // Tried this too, no effect
> Futures.addCallback(future, callback, executor);
>
> Can I improve the above code in some way?
> Are there any JMX metrics that can tell me what's going on?
>
> From the vmstat command, I see that CPU idle time is about 70% even though
> I am running about 60 threads per VM
> Total 20 client-VMs with 8 cores each are querying a Cassandra cluster
> with 16 VMs, 8-core each too.
>
> [image: Screen Shot 2017-03-18 at 6.46.03 PM.png]
> ​
> ​
>
>
> Thanks
> SG
>
>
> On Sat, Mar 18, 2017 at 5:38 PM, S G <sg.online.em...@gmail.com> wrote:
>
> Thanks. It seems that you guys have found executeAsync to yield good
> results.
> I want to share my understanding how this could benefit performance and
> some validation from the group will be awesome.
>
> I will call executeAsync() each time I want to get by primary-key.
> That ways, my client thread is not blocked anymore and I can submit a lot
> more requests per unit time.
> The async requests get piled on the underlying Netty I/O thread which
> ensures that it is always busy all the time.
> Earlier, the Netty I/O thread would have wasted some cycles when the
> sync-execute method was processing the results.
> And earlier, the client thread would also have wasted some cycles waiting
> for netty-thread to complete.
>
> With executeAsync(), none of them is waiting.
> Only thing to ensure is that the Netty thread's queue does not grow
> indefinitely.
>
> If the above theory is correct, then it sounds like a really good thing to
> try.
> If not, please do share some more details.
>
>
>
>
> On Sat, Mar 18, 2017 at 2:00 PM, <j.kes...@enercast.de> wrote:
>
> +1 for executeAsync – had a long time to argue that it’s not bad as with
> good old rdbms.
>
>
>
>
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Arvydas Jonusonis <arvydas.jonuso...@gmail.com>
> *Gesendet: *Samstag, 18. März 2017 19:08
> *An: *user@cassandra.apache.org
> *Betreff: *Re: How can I scale my read rate?
>
>
>
> ..then you're not taking advantage of request pipelining. Use executeAsync
> - this will increase your throughput for sure.
>
>
>
> http://www.datastax.com/dev/blog/java-driver-async-queries
>
>
>
>
>
> On Sat, Mar 18, 2017 at 08:00 S G <sg.online.em...@gmail.com> wrote:
>
> I have enabled JMX but not sure what metrics to look for - they are way
> too many of them.
>
> I am using session.execute(...)
>
>
>
>
>
> On Fri, Mar 17, 2017 at 2:07 PM, Arvydas Jonusonis <
> arvydas.jonuso...@gmail.com> wrote:
>
> It would be interesting to see some of the driver metrics (in your stress
> test tool) - if you enable JMX, they should be exposed by default.
>
> Also, are you using session.execute(..) or session.executeAsync(..) ?
>
>
>
>
>
>
>


Re: How can I scale my read rate?

2017-03-19 Thread Alain Rastoul

On 19/03/2017 02:54, S G wrote:

Forgot to mention that this vmstat picture is for the client-cluster
reading from Cassandra.



Hi SG,


Your numbers are low, 15k req/sec would be ok for a single node, for a 
12 nodes cluster, something goes wrong... how do you measure the 
throughput?


As suggested by others, to achieve good results you have to add threads 
and client VMs: Cassandra scales horizontally, not vertically, ie each 
single node performance will not goes up, but if you spread the load, by 
adding nodes the global cluster performance will.


Theorically,
assuming the data and the load is spread on the cluster (*1)
from your saying, with each request at 2ms avg (*2)
you should have 500 req/sec in each thread,
40 threads should go 20k req/sec on each client VM stress application (*3)
and 10 client VMs should go 200k req/sec on the whole cluster (*4)

=
(*1) the partition key (first PK column) must spread data on all nodes
and your testing code must spread the load by selecting evenly spread data.
(This point is very important: can you give information on your schema 
and your data ?)


(*2) to achieve better single client throughtput, may be you could 
prepare the requests, since you are always executing the same requests


(*3) => run more client tests application on each VM

(*4) add more client  VMs (Patrick's suggestion)

with (3) and (4) the throughput of each client will not be better, but 
the global cluster throughput will.

=

There are other factors to take into account if you are also writing to 
the cluster : read path, tombstones, replication, repairs etc. but 
that's not the case here?


Performance testing goes to the limit of our understanding of the system
 and is very difficult
... hence interesting :)



--
best,
Alain



Re: How can I scale my read rate?

2017-03-18 Thread S G
Forgot to mention that this vmstat picture is for the client-cluster
reading from Cassandra.

On Sat, Mar 18, 2017 at 6:47 PM, S G <sg.online.em...@gmail.com> wrote:

> ok, I gave the executeAsync() a try.
> Good part is that it was really easy to write the code for that.
> Bad part is that it did not had a huge effect on my throughput - I gained
> about 5% increase in throughput.
> I suspect it is so because my queries are all get-by-primary-key queries
> and were anyways completing in less than 2 milliseconds.
> So there was not much wait to begin with.
>
>
> Here is my code:
>
> String getByKeyQueryStr = "Select * from fooTable where key = " + key;
> //ResultSet result = session.execute(getByKeyQueryStr);  // Previous code
> ResultSetFuture future = session.executeAsync(getByKeyQueryStr);
> FutureCallback callback = new MyFutureCallback();
> executor = MoreExecutors.sameThreadExecutor();
> //executor = Executors.newFixedThreadPool(3); // Tried this too, no effect
> //executor = Executors.newFixedThreadPool(10); // Tried this too, no
> effect
> Futures.addCallback(future, callback, executor);
>
> Can I improve the above code in some way?
> Are there any JMX metrics that can tell me what's going on?
>
> From the vmstat command, I see that CPU idle time is about 70% even though
> I am running about 60 threads per VM
> Total 20 client-VMs with 8 cores each are querying a Cassandra cluster
> with 16 VMs, 8-core each too.
>
>
> ​
> ​
>
>
> Thanks
> SG
>
>
> On Sat, Mar 18, 2017 at 5:38 PM, S G <sg.online.em...@gmail.com> wrote:
>
>> Thanks. It seems that you guys have found executeAsync to yield good
>> results.
>> I want to share my understanding how this could benefit performance and
>> some validation from the group will be awesome.
>>
>> I will call executeAsync() each time I want to get by primary-key.
>> That ways, my client thread is not blocked anymore and I can submit a lot
>> more requests per unit time.
>> The async requests get piled on the underlying Netty I/O thread which
>> ensures that it is always busy all the time.
>> Earlier, the Netty I/O thread would have wasted some cycles when the
>> sync-execute method was processing the results.
>> And earlier, the client thread would also have wasted some cycles waiting
>> for netty-thread to complete.
>>
>> With executeAsync(), none of them is waiting.
>> Only thing to ensure is that the Netty thread's queue does not grow
>> indefinitely.
>>
>> If the above theory is correct, then it sounds like a really good thing
>> to try.
>> If not, please do share some more details.
>>
>>
>>
>>
>> On Sat, Mar 18, 2017 at 2:00 PM, <j.kes...@enercast.de> wrote:
>>
>>> +1 for executeAsync – had a long time to argue that it’s not bad as with
>>> good old rdbms.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Gesendet von meinem Windows 10 Phone
>>>
>>>
>>>
>>> *Von: *Arvydas Jonusonis <arvydas.jonuso...@gmail.com>
>>> *Gesendet: *Samstag, 18. März 2017 19:08
>>> *An: *user@cassandra.apache.org
>>> *Betreff: *Re: How can I scale my read rate?
>>>
>>>
>>>
>>> ..then you're not taking advantage of request pipelining. Use
>>> executeAsync - this will increase your throughput for sure.
>>>
>>>
>>>
>>> http://www.datastax.com/dev/blog/java-driver-async-queries
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Mar 18, 2017 at 08:00 S G <sg.online.em...@gmail.com> wrote:
>>>
>>> I have enabled JMX but not sure what metrics to look for - they are way
>>> too many of them.
>>>
>>> I am using session.execute(...)
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Mar 17, 2017 at 2:07 PM, Arvydas Jonusonis <
>>> arvydas.jonuso...@gmail.com> wrote:
>>>
>>> It would be interesting to see some of the driver metrics (in your
>>> stress test tool) - if you enable JMX, they should be exposed by default.
>>>
>>> Also, are you using session.execute(..) or session.executeAsync(..) ?
>>>
>>>
>>>
>>>
>>>
>>
>


Re: How can I scale my read rate?

2017-03-18 Thread S G
ok, I gave the executeAsync() a try.
Good part is that it was really easy to write the code for that.
Bad part is that it did not had a huge effect on my throughput - I gained
about 5% increase in throughput.
I suspect it is so because my queries are all get-by-primary-key queries
and were anyways completing in less than 2 milliseconds.
So there was not much wait to begin with.


Here is my code:

String getByKeyQueryStr = "Select * from fooTable where key = " + key;
//ResultSet result = session.execute(getByKeyQueryStr);  // Previous code
ResultSetFuture future = session.executeAsync(getByKeyQueryStr);
FutureCallback callback = new MyFutureCallback();
executor = MoreExecutors.sameThreadExecutor();
//executor = Executors.newFixedThreadPool(3); // Tried this too, no effect
//executor = Executors.newFixedThreadPool(10); // Tried this too, no effect
Futures.addCallback(future, callback, executor);

Can I improve the above code in some way?
Are there any JMX metrics that can tell me what's going on?

>From the vmstat command, I see that CPU idle time is about 70% even though
I am running about 60 threads per VM
Total 20 client-VMs with 8 cores each are querying a Cassandra cluster with
16 VMs, 8-core each too.


​
​


Thanks
SG


On Sat, Mar 18, 2017 at 5:38 PM, S G <sg.online.em...@gmail.com> wrote:

> Thanks. It seems that you guys have found executeAsync to yield good
> results.
> I want to share my understanding how this could benefit performance and
> some validation from the group will be awesome.
>
> I will call executeAsync() each time I want to get by primary-key.
> That ways, my client thread is not blocked anymore and I can submit a lot
> more requests per unit time.
> The async requests get piled on the underlying Netty I/O thread which
> ensures that it is always busy all the time.
> Earlier, the Netty I/O thread would have wasted some cycles when the
> sync-execute method was processing the results.
> And earlier, the client thread would also have wasted some cycles waiting
> for netty-thread to complete.
>
> With executeAsync(), none of them is waiting.
> Only thing to ensure is that the Netty thread's queue does not grow
> indefinitely.
>
> If the above theory is correct, then it sounds like a really good thing to
> try.
> If not, please do share some more details.
>
>
>
>
> On Sat, Mar 18, 2017 at 2:00 PM, <j.kes...@enercast.de> wrote:
>
>> +1 for executeAsync – had a long time to argue that it’s not bad as with
>> good old rdbms.
>>
>>
>>
>>
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *Arvydas Jonusonis <arvydas.jonuso...@gmail.com>
>> *Gesendet: *Samstag, 18. März 2017 19:08
>> *An: *user@cassandra.apache.org
>> *Betreff: *Re: How can I scale my read rate?
>>
>>
>>
>> ..then you're not taking advantage of request pipelining. Use
>> executeAsync - this will increase your throughput for sure.
>>
>>
>>
>> http://www.datastax.com/dev/blog/java-driver-async-queries
>>
>>
>>
>>
>>
>> On Sat, Mar 18, 2017 at 08:00 S G <sg.online.em...@gmail.com> wrote:
>>
>> I have enabled JMX but not sure what metrics to look for - they are way
>> too many of them.
>>
>> I am using session.execute(...)
>>
>>
>>
>>
>>
>> On Fri, Mar 17, 2017 at 2:07 PM, Arvydas Jonusonis <
>> arvydas.jonuso...@gmail.com> wrote:
>>
>> It would be interesting to see some of the driver metrics (in your stress
>> test tool) - if you enable JMX, they should be exposed by default.
>>
>> Also, are you using session.execute(..) or session.executeAsync(..) ?
>>
>>
>>
>>
>>
>


Re: How can I scale my read rate?

2017-03-18 Thread S G
Thanks. It seems that you guys have found executeAsync to yield good
results.
I want to share my understanding how this could benefit performance and
some validation from the group will be awesome.

I will call executeAsync() each time I want to get by primary-key.
That ways, my client thread is not blocked anymore and I can submit a lot
more requests per unit time.
The async requests get piled on the underlying Netty I/O thread which
ensures that it is always busy all the time.
Earlier, the Netty I/O thread would have wasted some cycles when the
sync-execute method was processing the results.
And earlier, the client thread would also have wasted some cycles waiting
for netty-thread to complete.

With executeAsync(), none of them is waiting.
Only thing to ensure is that the Netty thread's queue does not grow
indefinitely.

If the above theory is correct, then it sounds like a really good thing to
try.
If not, please do share some more details.




On Sat, Mar 18, 2017 at 2:00 PM, <j.kes...@enercast.de> wrote:

> +1 for executeAsync – had a long time to argue that it’s not bad as with
> good old rdbms.
>
>
>
>
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Arvydas Jonusonis <arvydas.jonuso...@gmail.com>
> *Gesendet: *Samstag, 18. März 2017 19:08
> *An: *user@cassandra.apache.org
> *Betreff: *Re: How can I scale my read rate?
>
>
>
> ..then you're not taking advantage of request pipelining. Use executeAsync
> - this will increase your throughput for sure.
>
>
>
> http://www.datastax.com/dev/blog/java-driver-async-queries
>
>
>
>
>
> On Sat, Mar 18, 2017 at 08:00 S G <sg.online.em...@gmail.com> wrote:
>
> I have enabled JMX but not sure what metrics to look for - they are way
> too many of them.
>
> I am using session.execute(...)
>
>
>
>
>
> On Fri, Mar 17, 2017 at 2:07 PM, Arvydas Jonusonis <
> arvydas.jonuso...@gmail.com> wrote:
>
> It would be interesting to see some of the driver metrics (in your stress
> test tool) - if you enable JMX, they should be exposed by default.
>
> Also, are you using session.execute(..) or session.executeAsync(..) ?
>
>
>
>
>


AW: How can I scale my read rate?

2017-03-18 Thread j.kesten
+1 for executeAsync – had a long time to argue that it’s not bad as with good 
old rdbms. 



Gesendet von meinem Windows 10 Phone

Von: Arvydas Jonusonis
Gesendet: Samstag, 18. März 2017 19:08
An: user@cassandra.apache.org
Betreff: Re: How can I scale my read rate?

..then you're not taking advantage of request pipelining. Use executeAsync - 
this will increase your throughput for sure.

http://www.datastax.com/dev/blog/java-driver-async-queries


On Sat, Mar 18, 2017 at 08:00 S G <sg.online.em...@gmail.com> wrote:
I have enabled JMX but not sure what metrics to look for - they are way too 
many of them.
I am using session.execute(...)


On Fri, Mar 17, 2017 at 2:07 PM, Arvydas Jonusonis 
<arvydas.jonuso...@gmail.com> wrote:
It would be interesting to see some of the driver metrics (in your stress test 
tool) - if you enable JMX, they should be exposed by default.

Also, are you using session.execute(..) or session.executeAsync(..) ?