[ 
https://issues.apache.org/jira/browse/CASSANDRA-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283996#comment-13283996
 ] 

Edward Capriolo commented on CASSANDRA-2897:
--------------------------------------------

So I hit this on my Casbase project a bit. In the end, it is a choice of the 
user what they want:

https://github.com/edwardcapriolo/casbase

{noformat}
Table.IndexRepair
--REPAIR_ON_READ- correct tables each read
--REPAIR_ON_WRITE- Read before write on insert, invalidates indexes
--REPAIR_NONE- Take no action, assumes no deletes or overwrites,or that user 
will handle/not care
{noformat}

Repair on read is the most interesting case to this discussion.

Someone may issue a query like this, that would never repair.

{noformat}
select * from cf1 where state='TX'
{noformat}

That is not exactly true....
Because to constitute the * result set of cf1 one has to go to cf1 and pull 
back the matching rows.

IE. You have an index of state='TX' that only gives you a result set of:

{noformat}
pk |state
row 1 in other table | 'TX'
row 2 in other table | 'TX'
{noformat}

But the user wants all the columns of row 1, so you are going to be reading 
those to build the final result.
If you read row 1 and find that it no longer exists well you can fix the index 
in this case.

Now if the user just asked:
{noformat}
select pk,state from cf1 where state='TX'
{noformat}
In this case you could answer this entire question from the index and might get 
stale results, because you do not yet know if PK was deleted.

I think exposing knobs like REPAIR_ON_READ, REPAIR_ON_WRITE, REPAIR_NONE is the 
way to go. Many use cases may never modify or overwrite a row so the entire 
'repair' is not needed.

Then again I am a power user that does not expect Cassandra to work like a 
relational database, maybe most people using 2x indexes do.

                
> Secondary indexes without read-before-write
> -------------------------------------------
>
>                 Key: CASSANDRA-2897
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2897
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: secondary_index
>
> Currently, secondary index updates require a read-before-write to maintain 
> the index consistency. Keeping the index consistent at all time is not 
> necessary however. We could let the (secondary) index get inconsistent on 
> writes and repair those on reads. This would be easy because on reads, we 
> make sure to request the indexed columns anyway, so we can just skip the row 
> that are not needed and repair the index at the same time.
> This does trade work on writes for work on reads. However, read-before-write 
> is sufficiently costly that it will likely be a win overall.
> There is (at least) two small technical difficulties here though:
> # If we repair on read, this will be racy with writes, so we'll probably have 
> to synchronize there.
> # We probably shouldn't only rely on read to repair and we should also have a 
> task to repair the index for things that are rarely read. It's unclear how to 
> make that low impact though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to