[ 
https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2105:
----------------------------------------

    Attachment: 2115_option2_nolock.patch
                2115_option1_withLock.patch

Attached not 1 but 2 options for this patch. I'm not sure with which version to 
go so I'm asking for opinions.

Version 1 is the one extracted from #1546. It uses a ReadWriteLock to protect 
from the race condition.

Version 2 don't use a lock. So less chances of lock contention which is always 
good. Only problem is, it still suffers in theory of a race condition. But I 
think this race condition is borderline impossible.
Basically, given a memtable m being flushed, let's call s(m) the sstable 
initially produced by its flushing and let's denote by s'(m) any sstable 
resulting of the compaction of s(m). The race is if a read thread sees m when 
grabbing the references to the memtable being flushed and sees s'(m) (not s(m), 
that is the initial race condition and this is not impossible at all) when 
grabing the reference to the sstables.
If it's unclear, the code has a comment explaining this that may be more clear.

So not sure which version to go with. I may slightly lean towards Version 1 
because I usually side with correction before anything else, but since this is 
in a critical path it feels slightly wasteful to use a lock for this given how 
remote the race condition of version 2 seems.


> Fix the read race condition in CFStore for counters 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8
>
>         Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to