[
https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-1600:
----------------------------------------
Attachment:
0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch
0002-thrift-generated-code-changes-v4.patch
0001-Add-optional-index-clause-to-KeyRange-v4.patch
Attaching rebased and updated v4 (I've squashed the CQL changes into the 3rd
patch).
bq. What do we gain from "typedefing" List<IndexExpression> to FilterClause? (I
note this was part of Stu and my original attempts back in April but I don't
remember a good reason for that.)
I do not really see a good reason either (I rebased that part without really
thinking about it). I've removed FilterClause in v4.
{quote}
{noformat}
+ /*
+ * XXX: If the range requested is a token range, we'll have to start
at the beginning (and stop at the end) of
+ * the indexed row unfortunately (which will be inefficient), because
we have not way to intuit the small
+ * possible key having a given token. A fix would be to actually store
the token along the key in the
+ * indexed row.
+ */
{noformat}
This is fine since there's no reason to be searching by token unless you're
doing an exhaustive scan, i.e. a m/r job.
{quote}
Actually, I believe this does have some impact. When a requested range of keys
spans multiple nodes, the range is splitted using nodes Token (by
SP.getRestrictedRanges()). And while it is true that in simple cases this
amount to checking all the keys in the node and thus is not inefficient, there
is at least 2 cases where this isn't the case:
* If the ranges a node is responsible of are not contiguous. Say a node is
responsible for range (5, 10] and (20, 30]. And say a user requests a range of
keys such that the start key token is 15 (and the end key token is say 25).
Then the node will end up being requested for keys in (20, 25] (where those are
tokens, not keys). In that case, it will have to uselessly scan (and skip) all
keys in (0, 10] the node has.
* When token range wraps. This is actually the same problem than above in that
when a node has a range like (2^127-1, 2^126], it really hold both ranges (0,
2^126] and (2^127, 2^127-1] which are non contiguous as far as the ordering on
disk is concerned.
While the 'the strategy creates non-contiguous ranges' problem is not a huge
deal since neither SimpleStrategy nor NTS creates them (if I'm correct), the
instance of the problem due to wrapping ranges is a very real inefficiency of
the implementation.
As a side note, storing the token as suggested in the comments above would also
likely have the benefit of helping a good deal the problem of the time spent
doing digest computations noted in CASSANDRA-3545.
{quote}
{noformat}
+
rows.addAll(RangeSliceVerbHandler.executeLocally(command));
{noformat}
Another place the original patches failed... we should avoid this because it
means we're now allowing one range scan per thrift client instead of one per
read stage thread, and it bypasses the "drop hopeless requests" overcapacity
protection built in there. Look at SP.LocalReadRunnable for how to do this
safely. Simplest fix would be to just continue routing all range scans over
MessagingService.
{quote}
While I agree with you, I'll note that this is not a problem introduced by this
patch: the current code does a call to CFS.getRangeSlice() directly in that
case (bypassing the read stage and the protections coming with it). We clearly
should fix that but it's neither a detail nor related to this patch so I
suggest spawning another ticket for that instead.
> Merge get_indexed_slices with get_range_slices
> ----------------------------------------------
>
> Key: CASSANDRA-1600
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
> Project: Cassandra
> Issue Type: New Feature
> Components: API
> Reporter: Stu Hood
> Assignee: Sylvain Lebresne
> Fix For: 1.1
>
> Attachments:
> 0001-Add-optional-FilterClause-to-KeyRange-and-support-do-v2.patch,
> 0001-Add-optional-FilterClause-to-KeyRange-and-support-doin.txt,
> 0001-Add-optional-FilterClause-to-KeyRange-v3.patch,
> 0001-Add-optional-index-clause-to-KeyRange-v4.patch,
> 0002-allow-get_range_slices-to-apply-filter-to-a-sequenti-v2.patch,
> 0002-allow-get_range_slices-to-apply-filter-to-a-sequential.txt,
> 0002-thrift-generated-code-changes-v3.patch,
> 0002-thrift-generated-code-changes-v4.patch,
> 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v3.patch,
> 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch,
> 0004-Update-cql-to-not-use-deprecated-index-scan-v3.patch
>
>
> From a comment on 1157:
> {quote}
> IndexClause only has a start key for get_indexed_slices, but it would seem
> that the reasoning behind using 'KeyRange' for get_range_slices applies there
> as well, since if you know the range you care about in the primary index, you
> don't want to continue scanning until you exhaust 'count' (or the cluster).
> Since it would appear that get_indexed_slices would benefit from a KeyRange,
> why not smash get_(range|indexed)_slices together, and make IndexClause an
> optional field on KeyRange?
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira