[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign

Sam Tunnicliffe (JIRA) Wed, 19 Aug 2015 09:40:59 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703314#comment-14703314
 ]


Sam Tunnicliffe commented on CASSANDRA-9459:
--------------------------------------------

I've pushed some further commits which I think make this ready for another 
round of review (pinging [~slebresne])

Outstanding issues (some of which may be split out to follow up tickets):

* I'd particularly like some feedback on the way that 
{{SIM.WriteTimeTransaction#onUpdated}} and {{Index#updateRow}} handle Memtable 
updates. Rather than pass the existing, update and reconciled versions of the 
row to every registered Index, and have each of them potentially perform a very 
similar set of diff operations, the diffing is done once (in the transaction) 
and only the deltas are passed to the {{Indexes}} - one row containing the 
subset of the existing row that is now gone, another with the subset of the 
merged row that was added in the operation. This feels a bit counter-intuitive 
and writing the javadoc on {{Index#updateRow}} was tough, so I'm a bit 
concerned that this is going to be tricky for implementors to work with (but I 
don't want to duplicate the diffing if we can avoid it).
* Provide a better API for (re)building indexes. The current approach assumes 
that indexes should always be built from the merged view of data in SSTables, 
but this may not always be the case. That said, this is true for most existing 
implementations, and so is optimised to perform only a single pass through the 
data. I don't want to prohibit that optimisation, so some further thought it 
required.
* Lookups in {{SecondaryIndexManager}} could certainly be improved with some 
better datastructures, rather than always resorting to a scan through the 
entire collection of registered indexes
* There is a mismatch between the name of an index as stored in schema and in 
the value returned from Index.getIndexName, which for the builtin index impls 
is the name of the underlying index CFS. This leaks into a number of places, 
notably around (re)building indexes. I've opened CASSANDRA-10127 for this.
* I'm not entirely happy with the way we validate restrictions using 
{{Index.supportsExpression}}. It seems a bit blunt, but I haven't been able to 
come up with anything better yet.


I've avoided squashing the [wip 
branch|https://github.com/beobal/cassandra/tree/9459-wip] since people have 
already commented on that, but I have had to rebase it several times so 
although the commit history has been overwritten, it's remained more or less 
semantically consistent.


> SecondaryIndex API redesign
> ---------------------------
>
>                 Key: CASSANDRA-9459
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9459
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0 beta 1
>
>
> For some time now the index subsystem has been a pain point and in large part 
> this is due to the way that the APIs and principal classes have grown 
> organically over the years. It would be a good idea to conduct a wholesale 
> review of the area and see if we can come up with something a bit more 
> coherent.
> A few starting points:
> * There's a lot in AbstractPerColumnSecondaryIndex & its subclasses which 
> could be pulled up into SecondaryIndexSearcher (note that to an extent, this 
> is done in CASSANDRA-8099).
> * SecondayIndexManager is overly complex and several of its functions should 
> be simplified/re-examined. The handling of which columns are indexed and 
> index selection on both the read and write paths are somewhat dense and 
> unintuitive.
> * The SecondaryIndex class hierarchy is rather convoluted and could use some 
> serious rework.
> There are a number of outstanding tickets which we should be able to roll 
> into this higher level one as subtasks (but I'll defer doing that until 
> getting into the details of the redesign):
> * CASSANDRA-7771
> * CASSANDRA-8103
> * CASSANDRA-9041
> * CASSANDRA-4458
> * CASSANDRA-8505
> Whilst they're not hard dependencies, I propose that this be done on top of 
> both CASSANDRA-8099 and CASSANDRA-6717. The former largely because the 
> storage engine changes may facilitate a friendlier index API, but also 
> because of the changes to SIS mentioned above. As for 6717, the changes to 
> schema tables there will help facilitate CASSANDRA-7771.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign

Reply via email to