[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487645#comment-14487645
 ] 

Andrés de la Peña commented on CASSANDRA-8717:
----------------------------------------------

It would be nice to have these changes in 2.1. I'm uploading a new version of 
the patch.

[~slebresne], I have renamed, as you suggested, the {{sort}} method to 
{{postReconciliationProcessing}}. Also I have renamed the method 
{{requiresFullScan}} to {{requiresScanningAllRanges}}, which seems clearer.

You are totally right about the computation of the concurrency factor. I have 
created a {{rowsToBeFetched}} variable representing the number of rows to be 
fetched. This is {{command.limit()}} in the regular case and {{command.limit() 
* ranges.size()}} when the command requieres scanning all the token ranges. In 
addition, if we know that the command needs to do a full scan then we can set 
the concurrency factor to {{ranges.size()}} in order to  query all the ranges 
in parallel. Thus, recalculating the concurrency factor is avoided in this 
particular case of full ranges scan.

Please let me know what you think about the new patch.

> Top-k queries with custom secondary indexes
> -------------------------------------------
>
>                 Key: CASSANDRA-8717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Andrés de la Peña
>            Assignee: Andrés de la Peña
>            Priority: Minor
>              Labels: 2i, secondary_index, sort, sorting, top-k
>             Fix For: 3.0
>
>         Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch, 
> 0002-Add-support-for-top-k-queries-in-2i.patch
>
>
> As presented in [Cassandra Summit Europe 
> 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
> modified to support general top-k queries with minimum changes in Cassandra 
> codebase. This way, custom 2i implementations could provide relevance search, 
> sorting by columns, etc.
> Top-k queries retrieve the k best results for a certain query. That implies 
> querying the k best rows in each token range and then sort them in order to 
> obtain the k globally best rows. 
> For doing that, we propose two additional methods in class 
> SecondaryIndexSearcher:
> {code:java}
> public boolean requiresFullScan(List<IndexExpression> clause)
> {
>     return false;
> }
> public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
> {
>     return rows;
> }
> {code}
> The first one indicates if a query performed in the index requires querying 
> all the nodes in the ring. It is necessary in top-k queries because we do not 
> know which node are the best results. The second method specifies how to sort 
> all the partial node results according to the query. 
> Then we add two similar methods to the class AbstractRangeCommand:
> {code:java}
>     this.searcher = 
> Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
> public boolean requiresFullScan() {
>     return searcher == null ? false : searcher.requiresFullScan(rowFilter);
> }
> public List<Row> combine(List<Row> rows)
> {
>     return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
> rows));
> }
> {code}
> Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
> shown in the attached patch.
> We think that the proposed approach provides very useful functionality with 
> minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to