Re: How to monitor datastax driver compression performance?

2019-04-09 Thread Jon Haddad
tlp-stress has support for customizing payloads, but it's not
documented very well.  For a given data model (say the KeyValue one),
you can override what tlp-stress will send over.  By default it's
pretty small, a handful of bytes.

If you pass --field.keyvalue.value (the table name + the field name)
then the custom field generator you'd like to use.  For example,
--field.keyvalue.value='random(1,11000)` will generate 10K random
characters.  You can also generate text from real words by using the
book(100,200) function (100-200 random works out of books) if you want
something that will compress better.

You can see a (poorly formatted) list of all the customizations you
can do by running `tlp-stress fields`

This is one the areas I haven't spent enough time on to share with the
world in a carefree manner, but it works.  If you're willing to
overlook the poor docs in the area I think it might meet your needs.

Regarding compression at the query level vs not, I think you should
look at the overhead first.  I'm betting you'll find it's
insignificant.  That said, you can always create two cluster objects
with two radically different settings if you find you need it.

On Tue, Apr 9, 2019 at 6:32 AM Gabriel Giussi  wrote:
>
> tlp-stress allow us to define size of rows? Because I will see the benefit of 
> compression in terms of request rates only if the compression ratio is 
> significant, i.e. requires less network round trips.
> This could be done generating bigger partitions with parameters -n and -p, 
> i.e. decreasing the -p?
>
> Also, don't you think that driver should allow configuring compression per 
> query? Because one table with wide rows could benefit from compression while 
> another one with less payload could not.
>
> Thanks for your help Jon.
>
>
> El lun., 8 abr. 2019 a las 19:13, Jon Haddad () escribió:
>>
>> If it were me, I'd look at raw request rates (in terms of requests /
>> second as well as request latency), network throughput and then some
>> flame graphs of both the server and your application:
>> https://github.com/jvm-profiling-tools/async-profiler.
>>
>> I've created an issue in tlp-stress to add compression options for the
>> driver: https://github.com/thelastpickle/tlp-stress/issues/67.  If
>> you're interested in contributing the feature I think tlp-stress will
>> more or less solve the remainder of the problem for you (the load
>> part, not the os numbers).
>>
>> Jon
>>
>>
>>
>>
>> On Mon, Apr 8, 2019 at 7:26 AM Gabriel Giussi  
>> wrote:
>> >
>> > Hi, I'm trying to test if adding driver compression will bring me any 
>> > benefit.
>> > I understand that the trade-off is less bandwidth but increased CPU usage 
>> > in both cassandra nodes (compression) and client nodes (decompression) but 
>> > I want to know what are the key metrics and how to monitor them to probe 
>> > compression is giving good results?
>> > I guess I should look at latency percentiles reported by 
>> > com.datastax.driver.core.Metrics and CPU usage, but what about bandwith 
>> > usage and compression ratio?
>> > Should I use tcpdump to capture packets length coming from cassandra 
>> > nodes? Something like tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | 
>> > sed -n "s/^.*length \(.*\).*$/\1/p" would be enough?
>> >
>> > Thanks
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to monitor datastax driver compression performance?

2019-04-09 Thread Gabriel Giussi
tlp-stress allow us to define size of rows? Because I will see the benefit
of compression in terms of request rates only if the compression ratio is
significant, i.e. requires less network round trips.
This could be done generating bigger partitions with parameters -n and -p,
i.e. decreasing the -p?

Also, don't you think that driver should allow configuring compression per
query? Because one table with wide rows could benefit from compression
while another one with less payload could not.

Thanks for your help Jon.


El lun., 8 abr. 2019 a las 19:13, Jon Haddad () escribió:

> If it were me, I'd look at raw request rates (in terms of requests /
> second as well as request latency), network throughput and then some
> flame graphs of both the server and your application:
> https://github.com/jvm-profiling-tools/async-profiler.
>
> I've created an issue in tlp-stress to add compression options for the
> driver: https://github.com/thelastpickle/tlp-stress/issues/67.  If
> you're interested in contributing the feature I think tlp-stress will
> more or less solve the remainder of the problem for you (the load
> part, not the os numbers).
>
> Jon
>
>
>
>
> On Mon, Apr 8, 2019 at 7:26 AM Gabriel Giussi 
> wrote:
> >
> > Hi, I'm trying to test if adding driver compression will bring me any
> benefit.
> > I understand that the trade-off is less bandwidth but increased CPU
> usage in both cassandra nodes (compression) and client nodes
> (decompression) but I want to know what are the key metrics and how to
> monitor them to probe compression is giving good results?
> > I guess I should look at latency percentiles reported by
> com.datastax.driver.core.Metrics and CPU usage, but what about bandwith
> usage and compression ratio?
> > Should I use tcpdump to capture packets length coming from cassandra
> nodes? Something like tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | sed
> -n "s/^.*length \(.*\).*$/\1/p" would be enough?
> >
> > Thanks
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: How to monitor datastax driver compression performance?

2019-04-08 Thread Jon Haddad
If it were me, I'd look at raw request rates (in terms of requests /
second as well as request latency), network throughput and then some
flame graphs of both the server and your application:
https://github.com/jvm-profiling-tools/async-profiler.

I've created an issue in tlp-stress to add compression options for the
driver: https://github.com/thelastpickle/tlp-stress/issues/67.  If
you're interested in contributing the feature I think tlp-stress will
more or less solve the remainder of the problem for you (the load
part, not the os numbers).

Jon




On Mon, Apr 8, 2019 at 7:26 AM Gabriel Giussi  wrote:
>
> Hi, I'm trying to test if adding driver compression will bring me any benefit.
> I understand that the trade-off is less bandwidth but increased CPU usage in 
> both cassandra nodes (compression) and client nodes (decompression) but I 
> want to know what are the key metrics and how to monitor them to probe 
> compression is giving good results?
> I guess I should look at latency percentiles reported by 
> com.datastax.driver.core.Metrics and CPU usage, but what about bandwith usage 
> and compression ratio?
> Should I use tcpdump to capture packets length coming from cassandra nodes? 
> Something like tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | sed -n 
> "s/^.*length \(.*\).*$/\1/p" would be enough?
>
> Thanks

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



How to monitor datastax driver compression performance?

2019-04-08 Thread Gabriel Giussi
Hi, I'm trying to test if adding driver compression will bring me any
benefit.
I understand that the trade-off is less bandwidth but increased CPU usage
in both cassandra nodes (compression) and client nodes (decompression) but
I want to know what are the key metrics and how to monitor them to probe
compression is giving good results?
I guess I should look at latency percentiles reported by
com.datastax.driver.core.Metrics and CPU usage, but what about bandwith
usage and compression ratio?
Should I use tcpdump to capture packets length coming from cassandra nodes?
Something like* tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | sed -n
"s/^.*length \(.*\).*$/\1/p"* would be enough?

Thanks