[ 
https://issues.apache.org/jira/browse/CASSANDRA-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237782#comment-13237782
 ] 

Jonathan Ellis commented on CASSANDRA-2897:
-------------------------------------------

Doug added over on the Hypertable post,

bq. In Hypertable, the way deletes are handled is by inserting delete records 
(tombstones), so during compaction the secondary index is purged of stale 
entries by bulk inserting a bunch of delete records. Since Hypertable is 
essentially a LSM tree, bulk inserts are very efficient and require no random 
i/o.

I think I understand: it generates both new, correct index entries, AND 
tombstones for old, invalid ones (the column entries in the parent CF that get 
discarded during compaction).

That's a fair bit of work for us, to change compaction to expose more than just 
the surviving value, but doable.

I like this idea, it should lower the overhead of indexes a lot, even for SSD 
deployments.  (The read-before-write that the current implementation requires 
extra locking, as well as the read itself.)
                
> Secondary indexes without read-before-write
> -------------------------------------------
>
>                 Key: CASSANDRA-2897
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2897
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: secondary_index
>
> Currently, secondary index updates require a read-before-write to maintain 
> the index consistency. Keeping the index consistent at all time is not 
> necessary however. We could let the (secondary) index get inconsistent on 
> writes and repair those on reads. This would be easy because on reads, we 
> make sure to request the indexed columns anyway, so we can just skip the row 
> that are not needed and repair the index at the same time.
> This does trade work on writes for work on reads. However, read-before-write 
> is sufficiently costly that it will likely be a win overall.
> There is (at least) two small technical difficulties here though:
> # If we repair on read, this will be racy with writes, so we'll probably have 
> to synchronize there.
> # We probably shouldn't only rely on read to repair and we should also have a 
> task to repair the index for things that are rarely read. It's unclear how to 
> make that low impact though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to