Andrés de la Peña created CASSANDRA-8717:
--------------------------------------------
Summary: Top-k queries with custom secondary indexes
Key: CASSANDRA-8717
URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Andrés de la Peña
Priority: Minor
Fix For: 2.1.3
Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch
As presented in [Cassandra Summit Europe
2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be
modified to support general top-k queries with minimum changes in Cassandra
codebase. This way, custom 2i implementations could provide relevance search,
sorting by columns, etc.
Top-k queries retrieve the k best results for a certain query. That implies
querying the k best rows in each token range and then sort them in order to
obtain the k globally best rows.
For doing that, we propose two additional methods in class
SecondaryIndexSearcher:
{code:java}
public boolean requiresFullScan(List<IndexExpression> clause)
{
return false;
}
public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
{
return rows;
}
{code}
The first one indicates if a query performed in the index requires querying all
the nodes in the ring. It is necessary in top-k queries because we do not know
which node are the best results. The second method specifies how to sort all
the partial node results according to the query.
Then we add two similar methods to the class AbstractRangeCommand:
{code:java}
this.searcher =
Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
public boolean requiresFullScan() {
return searcher == null ? false : searcher.requiresFullScan(rowFilter);
}
public List<Row> combine(List<Row> rows)
{
return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
}
{code}
Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as
shown in the attached patch.
We think that the proposed approach provides very useful functionality with
minimum impact in current codebase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)