[jira] [Comment Edited] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes

Ariel Weisberg (JIRA) Mon, 01 Dec 2014 11:10:11 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230244#comment-14230244
 ]


Ariel Weisberg edited comment on CASSANDRA-6976 at 12/1/14 7:08 PM:
--------------------------------------------------------------------

bq. Sure it does - if an action that is likely memory bound (like this one - 
after all, i
The entire thing runs in 60 milliseconds with 2000 tokens. That is 2x the time 
to warm up the cache (assuming a correct number for warmup). So warming up the 
cache is definitely impacting the numbers, but not changing it from 100s of 
milliseconds to 10s. Tack on the time  to warm up the last level cache to the 
current time and still the same order of magnitude.

bq. For a lookup (i.e. small) table query, or a range query that can be 
serviced entirely by the local node, it is quite unlikely that the fetching 
would dominate when talking about timescales >= 1ms.
Range queries are slow because they produce a lot of ranges. That means 
contacting a lot of nodes. The cost of getRestrictedRanges is proportional to 
the cost of getRangeSlice, but still a small part of overall execution time.

If the lookup table really only needed to contact one node getRestrictedRanges 
wouldn't run for long and would return a small set of ranges right?

bq. Like I said, please do feel to drop this particular line of enquiry for the 
moment, ...
What your describing is that it's bad in production we just don't see it in 
test. I don't see a reason to drop it just because the ticket got caught up in 
implementation details and not the user facing issue we want to address.  
[~jbellis]?

bq. In the meantime it might be worth having a simple short-circuit path for 
queries that may be answered by the local node only, though.
What queries could identify that this shortcut is possible? By nature those 
queries would only hit one local node if they didn't cover a lot of ranges in 
which case all the problem code we are discussing runs relatively fast 
(compared to its worst case).


was (Author: aweisberg):
bq. Sure it does - if an action that is likely memory bound (like this one - 
after all, i
The entire thing runs in 60 milliseconds with 2000 tokens. That is 2x the time 
to warm up the cache (assuming a correct number for warmup). So warming up the 
cache is definitely impacting the numbers, but not changing it from 100s of 
milliseconds to 10s. Tack on the time  to warm up the last level cache to the 
current time and still the same order of magnitude. We could do the cache 
optimization thing and then find out that in practice the cache is not 
beneficial anyways.

bq. For a lookup (i.e. small) table query, or a range query that can be 
serviced entirely by the local node, it is quite unlikely that the fetching 
would dominate when talking about timescales >= 1ms.
Range queries are slow because they produce a lot of ranges. That means 
contacting a lot of nodes. The cost of getRestrictedRanges is proportional to 
the cost of getRangeSlice, but still a small part of overall execution time.

If the lookup table really only needed to contact one node getRestrictedRanges 
wouldn't run for long and would return a small set of ranges right?

bq. Like I said, please do feel to drop this particular line of enquiry for the 
moment, ...
What your describing is that it's bad in production we just don't see it in 
test. I don't see a reason to drop it just because the ticket got caught up in 
implementation details and not the user facing issue we want to address.  
[~jbellis]?

bq. In the meantime it might be worth having a simple short-circuit path for 
queries that may be answered by the local node only, though.
What queries could identify that this shortcut is possible? By nature those 
queries would only hit one local node if they didn't cover a lot of ranges in 
which case all the problem code we are discussing runs relatively fast 
(compared to its worst case).

> Determining replicas to query is very slow with large numbers of nodes or 
> vnodes
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6976
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6976
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance
>         Attachments: GetRestrictedRanges.java, jmh_output.txt, 
> jmh_output_murmur3.txt, make_jmh_work.patch
>
>
> As described in CASSANDRA-6906, this can be ~100ms for a relatively small 
> cluster with vnodes, which is longer than it will spend in transit on the 
> network. This should be much faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes

Reply via email to