[ 
https://issues.apache.org/jira/browse/CASSANDRA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2893:
----------------------------------------

    Attachment: 0003-Add-AtomicSortedColumn-and-snapTree.patch
                0002-Make-memtable-use-CF.addAll.patch
                0001-Move-deletion-infos-into-ISortedColumns.patch

Attaching initial patches.

The basic idea is to make it so that applying a mutation to a memtable is 
atomic, or in other words, make it use CF.addAll() and have that last operation 
be atomic and isolated (adding to the row cache also needs to be atomic and 
isolated but it uses CF.addAll already so making CF.addAll atomic is the 
solution for that too).

To do that, addAll copies the initial cf, add the new columns and atomically 
compare and swap with the old one cf. To make this efficient, the patch uses 
the snapTree O(1) cloning (copy-on-write) facilities.

I'm attaching the snapTree jar, but note that it's modified from the original 
(https://github.com/nbronson/snaptree) because it has bug. The modified version 
with the small fix is at https://github.com/pcmanus/snaptree (I've issued a 
pull request). Btw, I don't know if the license of snapTree is compatible with 
the ASF one. Note that we only use the copy-on-write clone facility of 
snapTree, and not really the fact that it is thread-safe outside of that. So in 
particular a persistent sorted map could be used in place of snapTree if we 
wanted to, though the copy-on-write used by the latter is likely to generate 
less garbage overall.

I'm attaching 3 patches:
* The first patch pushes the CF deletion infos from the AbstractColumnContainer 
to the ISortedColums implementation. Reason being that we will want that both 
updates and deletes are atomic and isolated so we'll need to have those in the 
same structure.
* The second patch modifies Memtable.apply() to use CF.addAll directly.
* The third patch introduces AtomicSortedColumns using snapTree and uses it 
whenever thread-safety/isolation is needed. Note that it fully replace 
ThreadSafeSortedColumns that is removed, and also that the patch tries to limit 
the use of AtomicSortedColumns to concurrent context, making 
TreeMapBackedSortedColumns the default for other non-concurrent context.

There is two gray areas with this patch that I know of:
* It would be easy to break isolation for super columns. If cf is an 
AtomicSortedColumns backed (super) column family and you do a {{sc = 
cf.getColumn(someSCname)}} and then do {{sc.addAll(cols)}}, then that last 
operation won't be in isolation. I don't think we do that in any context where 
it matters, but still something to be aware of.
* Iterator based removal is not thread-safe. Basically, if you do an iteration, 
doing removes using the iterator remove() method and there is a concurrent 
mutation on the cf, the remove may well just be ignored.  I think the main 
place where we do iterator based removes is during CFS.removeDeleted(). But 
it's mostly done during queries/compaction so not in a concurrent context. We 
do a removeDeleted on cachedRow sometimes during compaction but in that case it 
won't be the end of the world if that remove is ignored because of a concurrent 
mutation. Still, not very beautiful but I don't see a simple solution (outside 
of not using iterator based removes that is).

Overall, I think the patch is ready for benchmarking (all unit tests are 
passing). I did a very quick stress test on my localhost and I didn't see any 
noticeable difference with or without the patch (neither writes nor reads).  
But 1) that was not a very scientific benchmark and 2) it was a short 
benchmark. I don't think raw performance will a problem with this patch, the 
problem is that it generates more garbage, which itself may degrade performance 
on the long run. That's probably what we'll want to benchmark.
                
> Add row-level isolation
> -----------------------
>
>                 Key: CASSANDRA-2893
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2893
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Move-deletion-infos-into-ISortedColumns.patch, 
> 0002-Make-memtable-use-CF.addAll.patch, 
> 0003-Add-AtomicSortedColumn-and-snapTree.patch, snaptree-0.1-SNAPSHOT.jar
>
>
> This could be done using an the atomic ConcurrentMap operations from the 
> Memtable and something like http://code.google.com/p/pcollections/ to replace 
> the ConcurrentSkipListMap in ThreadSafeSortedColumns.  The trick is that 
> pcollections does not provide a SortedMap, so we probably need to write our 
> own.
> Googling [persistent sortedmap] I found 
> http://code.google.com/p/actord/source/browse/trunk/actord/src/main/scala/ff/collection
>  (in scala) and http://clojure.org/data_structures#Data Structures-Maps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to