[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign

JIRA Mon, 27 Jul 2015 06:21:09 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642709#comment-14642709
 ]


Andrés de la Peña commented on CASSANDRA-9459:
----------------------------------------------

[~slebresne], [~beobal], I'm still not familiarized with the 3.0 changes but, 
as far as I understand, the iterator passed to {{postReconciliationProcessing}} 
allows it to read {{n}} rows of each one of the implied ranges. Thus, it's a 
much cleaner way to perform the top-key feature. However, I have doubts about 
the concurrency factor, which depends on the {{estimateResultsPerRange}}. Given 
that top-key queries always require to scan all ranges, I think that it would 
be better to fix it to the number of ranges, if I am not missing anything.

[~beobal], the new API looks great. I especially like the method 
{{updateRow(Row oldRow, Row newRow)}}. True per-row indexes, not linked to any 
specific column, are a big win. Adding support for more operators like OR is a 
good idea, but I think that it should be a way to add custom query syntax. 
Currently we are using column-linked queries as:
{code:sql}
SELECT * FROM tweets WHERE lucene='{
    filter : {type:"boolean", must:[
                   {type:"range", field:"time", lower:"2014/04/25", 
upper:"2014/05/1", pattern:"yyyy/MM/dd"},
                   {type:"prefix", field:"user", value:"a"} ] },
    query  : {type:"phrase", field:"body", value:"big data gives 
organizations", slop:1, max_expansions:1},
    sort   : {fields: [ {field:"time", reverse:true} ] }
}' limit 100; 
{code}
I'm wondering how it can be done with the new approach.

Another interesting idea that I don't know if it has been already addressed in 
3.0, is to support paging over indexes returning results in an order different 
to those defined by the partitioner and the column name. In Cassandra 2.x it's 
problematic because the last row key is used as the start of the next page 
{{DataRange}}, whereas it would be preferable to have {{DataRage}} containing 
both the original key range requested by the user and the last key of the last 
page. Currently we are addressing it with a custom, ugly {{QueryHandler}}, but 
it would be a nice feature to have a more generic support for this, unless it 
already exists.

> SecondaryIndex API redesign
> ---------------------------
>
>                 Key: CASSANDRA-9459
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9459
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0 beta 1
>
>
> For some time now the index subsystem has been a pain point and in large part 
> this is due to the way that the APIs and principal classes have grown 
> organically over the years. It would be a good idea to conduct a wholesale 
> review of the area and see if we can come up with something a bit more 
> coherent.
> A few starting points:
> * There's a lot in AbstractPerColumnSecondaryIndex & its subclasses which 
> could be pulled up into SecondaryIndexSearcher (note that to an extent, this 
> is done in CASSANDRA-8099).
> * SecondayIndexManager is overly complex and several of its functions should 
> be simplified/re-examined. The handling of which columns are indexed and 
> index selection on both the read and write paths are somewhat dense and 
> unintuitive.
> * The SecondaryIndex class hierarchy is rather convoluted and could use some 
> serious rework.
> There are a number of outstanding tickets which we should be able to roll 
> into this higher level one as subtasks (but I'll defer doing that until 
> getting into the details of the redesign):
> * CASSANDRA-7771
> * CASSANDRA-8103
> * CASSANDRA-9041
> * CASSANDRA-4458
> * CASSANDRA-8505
> Whilst they're not hard dependencies, I propose that this be done on top of 
> both CASSANDRA-8099 and CASSANDRA-6717. The former largely because the 
> storage engine changes may facilitate a friendlier index API, but also 
> because of the changes to SIS mentioned above. As for 6717, the changes to 
> schema tables there will help facilitate CASSANDRA-7771.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign

Reply via email to