[jira] [Commented] (CASSANDRA-13499) Avoid duplicate calls to the same custom row index

Sam Tunnicliffe (JIRA) Mon, 15 May 2017 08:06:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010653#comment-16010653
 ]


Sam Tunnicliffe commented on CASSANDRA-13499:
---------------------------------------------

bq. these implementations does not allow to mix various index implementations, 
when you have for a regular index for column A and custom row-based index for 
column B+C

No, that's not the case. It's perfectly possible to mix indexes in exactly the 
way you describe. At update time, each index is consulted as to whether it is 
interested in the incoming update, in {{SIM::newUpdateTransaction}}. Given the 
set of columns that the update contains, the index implementation either 
returns an {{Indexer}} if it should process the update, or null if not. One of 
the drivers for reworking the index API in CASSANDRA-9459 was to make row based 
indexes less of a hack which piggy backs on column based indexes. 

It's actually more of an issue to determine the correct index for a given 
query, as the built in heuristics-based approach may not be ideal for every 
implementation. Essentially, at query time each registered index which supports 
at least one of the query's index expressions provides an estimated result 
count. The naive approach to selection simply chooses whichever index expects 
to return the fewest results. Clearly, this is quite simplistic (of course 
Index impls are free to decide how they come up with the estimate) so there is 
a means to force the use of a specific index using custom expressions. I'll 
refer to the Stratio implementation for an example:

{code}
SELECT * FROM tweets WHERE expr(tweets_index, '{
   filter: [
      {type: "range", field: "time", lower: "2014/04/25", upper: "2014/05/01"},
      {type: "prefix", field: "user", value: "a"},
      {type: "geo_distance", field: "place", latitude: 40.3930, longitude: 
-3.7328, max_distance: "1km"}
   ],
   query: {type: "phrase", field: "body", value: "big data gives 
organizations", slop: 1},
   sort: {field: "time", reverse: true}
}') limit 100;
{code}

With a custom expression, the first argument is the index name, and the 
presence of a custom expression in the query ensures that the index is names is 
used. The second argument is the implementation specific query info.

bq. unfortunately, the alternative syntax CREATE INDEX ... 
<table>(col1,col2....) cannot be updated (no ALTER INDEX statement....).

As you point out, there is no support for {{ALTER INDEX}}, but this is 
primarily because any modification of an index definition is probably going to 
require a rebuild of the existing index. So in order for {{ALTER}} to be more 
useful than simply {{DROP INDEX..CREATE INDEX}}, it would need to manage the 
rebuild in the background so that the old index continued to be used until the 
new one was ready and then perform the swap operation. This is not to say that 
that couldn't be done, just that currently it isn't. My point is that this is 
not specific to any particular type of index.



> Avoid duplicate calls to the same custom row index
> --------------------------------------------------
>
>                 Key: CASSANDRA-13499
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13499
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: vincent royer
>            Priority: Minor
>             Fix For: 3.0.14, 3.11.0, 4.x
>
>         Attachments: 0006-Avoid-duplicate-calls-to-the-same-custom-index.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Avoid duplicate calls to the same custom row index by using a dedicated 
> Set<Index> rather than the collection indexes.values().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-13499) Avoid duplicate calls to the same custom row index

Reply via email to