[jira] [Commented] (CASSANDRA-8505) Invalid results are returned while secondary index are being build

Sam Tunnicliffe (JIRA) Wed, 11 Nov 2015 09:48:10 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000767#comment-15000767
 ]


Sam Tunnicliffe commented on CASSANDRA-8505:
--------------------------------------------

I think the various index states can be reduced to a simple ready/not ready 
check. What's more unless we intend to change the established behaviour fairly 
significantly, once an index moves to a ready state it never moves back to 
being not ready. The only times when we modify the status in the system table 
are when the index is removed (in which case we have no problem with being able 
to query using it) or during a rebuild. In the latter case though, we probably 
shouldn't reject queries (and we don't currently), as an index rebuild is 
incremental. That is, we don't scrap the existing index tables and rebuild 
everything from scratch, just write new index SSTables to supercede the old 
ones. So although it's certainly possible to get incorrect results during a 
rebuild (because of missing/stale entries), the results only get more correct 
as the rebuild progresses. Changing this so that all queries against that index 
return errors until all rebuilds complete seems like a step backwards. It seems 
more reasonable to reject queries until the initial build has been performed, 
as per the example in the description, but this only requires a simple boolean 
to track state between instantiating/registering the index and its initial 
build task completing (if one is required). 

It would be good to have some test coverage of this, although the best I could 
come up with is a dtest which inserts many rows, then adds the index and 
queries immediately expecting ReadFailureException, which is fairly lame and 
fragile.

A couple of points specific to the 3.0 patch:

* The fix for CASSANDRA-10595 has been lost. If an index doesn't register 
itself in {{createIndex}}, don't ask it for an initalization task, just set 
{{initialBuildTask == null}}. 
* {{SIM::reloadIndex}} has changed since the patch was created (due to 
CASSANDRA-10604) - I think that no changes to this method are now required. I 
did notice though that the current implementation actually makes a redundant 
call to {{getMetadataReloadTask}}, so if you could fix that while you're here, 
that'd be great.

bq. Secondary index and their build/not build status are node-local. By 
consequence it is not possible to know on a coordinator node if the index is 
fully build. It can be built on the coordinator but still building on other 
nodes

For future reference on this point, we also have CASSANDRA-9967 which has a 
very similar intent.

> Invalid results are returned while secondary index are being build
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-8505
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8505
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>             Fix For: 2.2.x, 3.0.x
>
>
> If you request an index creation and then execute a query that use the index 
> the results returned might be invalid until the index is fully build. This is 
> caused by the fact that the table column will be marked as indexed before the 
> index is ready.
> The following unit tests can be use to reproduce the problem:
> {code}
>     @Test
>     public void testIndexCreatedAfterInsert() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, c int, primary key((a, 
> b)))");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 0, 0);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 1, 1);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 2, 2);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 0, 3);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 1, 4);");
>         
>         createIndex("CREATE INDEX ON %s(b)");
>         
>         assertRows(execute("SELECT * FROM %s WHERE b = ?;", 1),
>                    row(0, 1, 1),
>                    row(1, 1, 4));
>     }
>     
>     @Test
>     public void testIndexCreatedBeforeInsert() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, c int, primary key((a, 
> b)))");
>         createIndex("CREATE INDEX ON %s(b)");
>         
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 0, 0);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 1, 1);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 2, 2);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 0, 3);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 1, 4);");
>         assertRows(execute("SELECT * FROM %s WHERE b = ?;", 1),
>                    row(0, 1, 1),
>                    row(1, 1, 4));
>     }
> {code}
> The first test will fail while the second will work. 
> In my opinion the first test should reject the request as invalid (as if the 
> index was not existing) until the index is fully build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8505) Invalid results are returned while secondary index are being build

Reply via email to