[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

Sylvain Lebresne (JIRA) Fri, 27 Mar 2015 04:46:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383704#comment-14383704
 ]


Sylvain Lebresne commented on CASSANDRA-8717:
---------------------------------------------

I don't have a problem with this in theory, at least in 3.0 (I tend to agree 
with Aleksey on that part), though I could argue that what you fundamentally 
ask is not specific to indexing. What you want is a way to "transform" the 
result of internal queries. It's rather close to aggregation except that 
instead of transforming multiple rows into a single, you want to transform some 
rows into other rows (sorting them being just one particular use case of that). 
 The fact that the results you want to transform is the result of your custom 
index is kind of incidental. So I do feel that implementing this as the more 
general concept of results transformation would be cleaner (and more generic).  
However, doing so is probably a little bit more involved so I'm happy to 
"hijack" the 2ndary index API for that in the short term and leave 
generalization to later, provided we agree that we may generalize that better 
and thus slightly break those new APIs.

Now on the patch, I do think {{requiresFullScan}} somewhat break the 
{{concurrencyFactor}} computation in {{getRangeSlice}} as {{remainingRows}} can 
become negative. This is not a huge deal in the sense that the code ensure the 
{{concurrentFactor}} is never smaller than 1, but it still is kind of wrong in 
principle. In fact, that method is really about modifying the query limit 
internally (up until the combine method has been applied), and that's imo the 
proper way to expose it.

Another nit is that we should rename the {{sort}} method in something more 
generic (as said above, sorting is somewhat of a special case and no reason to 
imply a limitation to that). It could be renamed {{combine}} or, imo a bit 
better, something like {{postReconciliationProcessing}}.


> Top-k queries with custom secondary indexes
> -------------------------------------------
>
>                 Key: CASSANDRA-8717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Andrés de la Peña
>            Assignee: Andrés de la Peña
>            Priority: Minor
>              Labels: 2i, secondary_index, sort, sorting, top-k
>             Fix For: 3.0
>
>         Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch
>
>
> As presented in [Cassandra Summit Europe 
> 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
> modified to support general top-k queries with minimum changes in Cassandra 
> codebase. This way, custom 2i implementations could provide relevance search, 
> sorting by columns, etc.
> Top-k queries retrieve the k best results for a certain query. That implies 
> querying the k best rows in each token range and then sort them in order to 
> obtain the k globally best rows. 
> For doing that, we propose two additional methods in class 
> SecondaryIndexSearcher:
> {code:java}
> public boolean requiresFullScan(List<IndexExpression> clause)
> {
>     return false;
> }
> public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
> {
>     return rows;
> }
> {code}
> The first one indicates if a query performed in the index requires querying 
> all the nodes in the ring. It is necessary in top-k queries because we do not 
> know which node are the best results. The second method specifies how to sort 
> all the partial node results according to the query. 
> Then we add two similar methods to the class AbstractRangeCommand:
> {code:java}
>     this.searcher = 
> Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
> public boolean requiresFullScan() {
>     return searcher == null ? false : searcher.requiresFullScan(rowFilter);
> }
> public List<Row> combine(List<Row> rows)
> {
>     return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
> rows));
> }
> {code}
> Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
> shown in the attached patch.
> We think that the proposed approach provides very useful functionality with 
> minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

Reply via email to