[
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422986#comment-13422986
]
Sylvain Lebresne commented on CASSANDRA-1337:
---------------------------------------------
The patch almost move the check for "do we have enough rows for the query"
inside the "don't use the local path for that range", which is broken (as in
"uselessly inefficient"), because it means if we do have enough rows locally,
we will still check another range.
But probably more importantly, I'm confused on how this patch works.
First, it estimated how much the query of *a range* will likely yield based on
the *total* number of keys on the node. We should at least divide this by the
replication factor to have a proper estimate of the number-of-keys-per-range.
Second, it seems to correctly parallelize "old-style" range_slice (i.e.
range_slice for thrift without any IndexExpression), but I think it doesn't
correctly handle neither secondary indexes (which was the main goal of the
patch I believe), not any kind of range slices when CQL3 is involved:
* For secondary index queries, the number or rows returned doesn't depend at
all on the number of row keys the node holds (and thus if follows that
estimating the number of parallel queries to do based on that parameter is
broken), it depends on how many columns the row for the most selective index
contains. So for the concurrencyFactor we should 1) figure out which index will
be used and 2) probably use the estimated mean columns count for that index.
* For CQL3 queries, they use the maxIsColumns parameters (for both traditional
*and* 2ndary index queries) and so the number of rows returned shouldn't be
directly compared to maxResults (i.e. if the first row we find has enough
columns to satisfy maxResults, we're done). In that case, it unfortunately
become more complicated to predict how much "results" a query might yield in
general, because this depends on the column filter. I.e. if the filter is a
name filter, or an "identity" slice filter (as in IdentitySliceFilter), we can
try an estimate (in the latter case, something like maxResults / (estimatedKeys
* meanColumnsCountPerKey)), but for other kind of slice filters, I don't think
we can do much estimate. That being said, it might still be worth using the
estimate in those two cases, because at least for 2ndary index query, the
column filter will likely be very often an "identity" slice. And in that
"identity" slice case, for 2ndary index and CQL3, the estimation should be
something like maxResults / (meanColumnCountPerKey(index used) *
meanColumnCountPerKey(parent cf)).
I'll note however that in both case, the meanColumnsCount is not necessary the
perfect estimate to use, as it pretty much imply that half the time we will
query one more range than is necessary. Instead we could either use the
maxColumnsCount (if we really want to be conservative) or add some fudge factor
to the mean. Fudge factor that may maybe be based on the difference between the
min, mean and max columns count estimates.
> parallelize fetching rows for low-cardinality indexes
> -----------------------------------------------------
>
> Key: CASSANDRA-1337
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: David Alves
> Priority: Minor
> Fix For: 1.2
>
> Attachments:
> 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
> CASSANDRA-1337.patch
>
> Original Estimate: 8h
> Remaining Estimate: 8h
>
> currently, we read the indexed rows from the first node (in partitioner
> order); if that does not have enough matching rows, we read the rows from the
> next, and so forth.
> we should use the statistics fom CASSANDRA-1155 to query multiple nodes in
> parallel, such that we have a high chance of getting enough rows w/o having
> to do another round of queries (but, if our estimate is incorrect, we do need
> to loop and do more rounds until we have enough data or we have fetched from
> each node).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira