[jira] [Commented] (CASSANDRA-10124) Full support for multi-column and per-row indexes

Sam Tunnicliffe (JIRA) Wed, 26 Aug 2015 03:15:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712882#comment-14712882
 ]


Sam Tunnicliffe commented on CASSANDRA-10124:
---------------------------------------------

Some proposals for syntax:

h6. Multi column custom index DDL
{code}
CREATE CUSTOM INDEX ON test_table(col1, col2, col3) USING 
'com.foo.bar.CustomIndex';
{code} 
When the targets include non-frozen collections, I think in general we should 
disallow the use of qualifiers, i.e. keys()/entries(). The main reason for 
including those when creating non-custom indexes is so C* can select the right 
index implementation, but for CUSTOM indexes that's done explicitly. Another 
argument is a practical one; when we store that info in schema tables it's 
always been with assumption of a single target column, so we have never needed 
to explicitly link the qualifier to a column. Its representation is as a key 
without a value; in pre-6717 schema in the json string in 
{{system.schema_columns.index_options}}, and in 3.0 in the native map 
{{system_schema.indexes.options}}.

h6. Row based custom index DDL
{code}
CREATE CUSTOM INDEX ON test_table USING 'com.foo.bar.CustomIndex';
CREATE CUSTOM INDEX ON test_table() USING 'com.foo.bar.CustomIndex';
CREATE CUSTOM ROW INDEX ON test_table USING 'com.foo.bar.CustomIndex';
{code}
Personally, I prefer the first option for simplicity.

h6. Select syntax

There are 2 parts to this, modifying the relations syntax so to extend beyond 
the usual {{<identifier> <operator> <term>}} syntax and adding a means to force 
(or at least suggest) a specific index be used for a particular query. Both are 
required for true row based indexes, but I think the latter could also be used 
with single and multicolumn indexes.

For providing an explicit instruction to C* that it should bypass the usual 
index selection mechanisms and force the use of a particular index I propose 
adding an optional {{USING INDEX}} to the {{SELECT}} syntax. The change to 
decouple a query term from a particular column identifier could be as simple as 
just allowing a standalone string literal in the {{WHERE}} clause, or if we 
want to be more explicit we could introduce a pseudo function in the system 
keyspace along the lines of {{token()}}.

To take one of the examples from the [Stratio 
docs|https://github.com/Stratio/cassandra-lucene-index]:

{code}
SELECT * FROM tweets WHERE lucene='{
    filter : {type:"range", field:"time", lower:"2014/04/25", 
upper:"2014/05/1"},
    query  : {type:"phrase", field:"body", value:"big data gives 
organizations", slop:1}
}' 
{code}

The {{lucene}} field is a dummy column that has to be added to the table in 
order to associate the pre-3.0 row-based index with the {{tweets}} table. We 
could rewrite this query as:

{code}
SELECT * FROM tweets 
WHERE expr('{filter : {type:"range", field:"time", lower:"2014/04/25", 
upper:"2014/05/1"},
             query  : {type:"phrase", field:"body", value:"big data gives 
organizations", slop:1}}') 
USING INDEX lucene;
{code}

In this version {{lucene}} is the name of the index, rather than a column and 
the query string itself is wrapped in the {{expr()}} function. 

As mentioned, we could also allow {{USING INDEX}} with the usual relation 
syntax, for example:
{code}
CREATE TABLE ks.t1(k int, v1 int, v2 int, PRIMARY KEY (k));
CREATE INDEX v1_idx ON ks.t1(v1);
CREATE INDEX v2_idx ON ks.t1(v2);
CREATE CUSTOM INDEX v1_v2_idx ON ks.t1(v1, v2) USING 
'com.foo.bar.CustomMultiColumnIndex';
{code}

There are couple of things to consider about querying this table using the 
indexes:
{code}
SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 ALLOW FILTERING;
{code}

* Even though we have an index which covers both {{v1}} and {{v2}}, there's no 
guarantee that the query will use it, as the either of the regular indexes may 
be considered more selective under the current scheme (or even equally 
selective, in which case the choice is undefined). 
* Even if we *only* have the custom index we still need to {{ALLOW FILTERING}} 
as the index expressions are only ever considered individually, so if there is 
> 1 {{ALLOW FILTERING}} is always required.

{code}
SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING v1_v2_idx;
{code}

We could allow {{USING INDEX}} both to force the choice of index and to 
override the requirement for {{ALLOW FILTERING}}. Perhaps the {{ALLOW 
FILTERING}} override could be dependent on all the restrictions being able to 
besatisfied by the specified index. Anyway, it seems that something akin to 
Oracle's [index 
hinting|http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#CHDJDIAH]
 might be generally useful for power users and/or during development.

> Full support for multi-column and per-row indexes
> -------------------------------------------------
>
>                 Key: CASSANDRA-10124
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10124
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>              Labels: client-impacting
>             Fix For: 3.0 beta 2
>
>
> Since CASSANDRA-6717 decoupled a secondary index from a single column, we can 
> expand support for indexes with multiple target columns and for row-based 
> indexes with truly dynamic targets.  
> Much of the plumbing for this has been done in CASSANDRA-7771, CASSANDRA-6717 
> & by the API rework in CASSANDRA-9459. What remains is:
> * Decide on syntax for DDL statements
> * Decide on WHERE clause syntax for SELECT (there is some discussion on this 
> in the comments of CASSANDRA-9459)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10124) Full support for multi-column and per-row indexes

Reply via email to