[jira] [Updated] (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices

Sylvain Lebresne (Updated) (JIRA) Mon, 02 Jan 2012 06:05:01 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-1600:
----------------------------------------

    Attachment: 
0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch
                0002-thrift-generated-code-changes-v4.patch
                0001-Add-optional-index-clause-to-KeyRange-v4.patch

Attaching rebased and updated v4 (I've squashed the CQL changes into the 3rd 
patch).

bq. What do we gain from "typedefing" List<IndexExpression> to FilterClause? (I 
note this was part of Stu and my original attempts back in April but I don't 
remember a good reason for that.)

I do not really see a good reason either (I rebased that part without really 
thinking about it). I've removed FilterClause in v4.

{quote}
{noformat}
+        /*
+         * XXX: If the range requested is a token range, we'll have to start 
at the beginning (and stop at the end) of
+         * the indexed row unfortunately (which will be inefficient), because 
we have not way to intuit the small
+         * possible key having a given token. A fix would be to actually store 
the token along the key in the
+         * indexed row.
+         */
{noformat}
This is fine since there's no reason to be searching by token unless you're 
doing an exhaustive scan, i.e. a m/r job.
{quote}

Actually, I believe this does have some impact. When a requested range of keys 
spans multiple nodes, the range is splitted using nodes Token (by 
SP.getRestrictedRanges()). And while it is true that in simple cases this 
amount to checking all the keys in the node and thus is not inefficient, there 
is at least 2 cases where this isn't the case:
* If the ranges a node is responsible of are not contiguous. Say a node is 
responsible for range (5, 10] and (20, 30]. And say a user requests a range of 
keys such that the start key token is 15 (and the end key token is say 25). 
Then the node will end up being requested for keys in (20, 25] (where those are 
tokens, not keys). In that case, it will have to uselessly scan (and skip) all 
keys in (0, 10] the node has.
* When token range wraps. This is actually the same problem than above in that 
when a node has a range like (2^127-1, 2^126], it really hold both ranges (0, 
2^126] and (2^127, 2^127-1] which are non contiguous as far as the ordering on 
disk is concerned.
While the 'the strategy creates non-contiguous ranges' problem is not a huge 
deal since neither SimpleStrategy nor NTS creates them (if I'm correct), the 
instance of the problem due to wrapping ranges is a very real inefficiency of 
the implementation.

As a side note, storing the token as suggested in the comments above would also 
likely have the benefit of helping a good deal the problem of the time spent 
doing digest computations noted in CASSANDRA-3545.

{quote}
{noformat}
+                        
rows.addAll(RangeSliceVerbHandler.executeLocally(command));
{noformat}
Another place the original patches failed... we should avoid this because it 
means we're now allowing one range scan per thrift client instead of one per 
read stage thread, and it bypasses the "drop hopeless requests" overcapacity 
protection built in there. Look at SP.LocalReadRunnable for how to do this 
safely. Simplest fix would be to just continue routing all range scans over 
MessagingService.
{quote}

While I agree with you, I'll note that this is not a problem introduced by this 
patch: the current code does a call to CFS.getRangeSlice() directly in that 
case (bypassing the read stage and the protections coming with it). We clearly 
should fix that but it's neither a detail nor related to this patch so I 
suggest spawning another ticket for that instead.
                
> Merge get_indexed_slices with get_range_slices
> ----------------------------------------------
>
>                 Key: CASSANDRA-1600
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Stu Hood
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1
>
>         Attachments: 
> 0001-Add-optional-FilterClause-to-KeyRange-and-support-do-v2.patch, 
> 0001-Add-optional-FilterClause-to-KeyRange-and-support-doin.txt, 
> 0001-Add-optional-FilterClause-to-KeyRange-v3.patch, 
> 0001-Add-optional-index-clause-to-KeyRange-v4.patch, 
> 0002-allow-get_range_slices-to-apply-filter-to-a-sequenti-v2.patch, 
> 0002-allow-get_range_slices-to-apply-filter-to-a-sequential.txt, 
> 0002-thrift-generated-code-changes-v3.patch, 
> 0002-thrift-generated-code-changes-v4.patch, 
> 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v3.patch, 
> 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch, 
> 0004-Update-cql-to-not-use-deprecated-index-scan-v3.patch
>
>
> From a comment on 1157:
> {quote}
> IndexClause only has a start key for get_indexed_slices, but it would seem 
> that the reasoning behind using 'KeyRange' for get_range_slices applies there 
> as well, since if you know the range you care about in the primary index, you 
> don't want to continue scanning until you exhaust 'count' (or the cluster).
> Since it would appear that get_indexed_slices would benefit from a KeyRange, 
> why not smash get_(range|indexed)_slices together, and make IndexClause an 
> optional field on KeyRange?
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices

Reply via email to