[
https://issues.apache.org/jira/browse/CASSANDRA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-2893:
----------------------------------------
Attachment: 0003-Add-AtomicSortedColumn-and-snapTree.patch
0002-Make-memtable-use-CF.addAll.patch
0001-Move-deletion-infos-into-ISortedColumns.patch
Attaching initial patches.
The basic idea is to make it so that applying a mutation to a memtable is
atomic, or in other words, make it use CF.addAll() and have that last operation
be atomic and isolated (adding to the row cache also needs to be atomic and
isolated but it uses CF.addAll already so making CF.addAll atomic is the
solution for that too).
To do that, addAll copies the initial cf, add the new columns and atomically
compare and swap with the old one cf. To make this efficient, the patch uses
the snapTree O(1) cloning (copy-on-write) facilities.
I'm attaching the snapTree jar, but note that it's modified from the original
(https://github.com/nbronson/snaptree) because it has bug. The modified version
with the small fix is at https://github.com/pcmanus/snaptree (I've issued a
pull request). Btw, I don't know if the license of snapTree is compatible with
the ASF one. Note that we only use the copy-on-write clone facility of
snapTree, and not really the fact that it is thread-safe outside of that. So in
particular a persistent sorted map could be used in place of snapTree if we
wanted to, though the copy-on-write used by the latter is likely to generate
less garbage overall.
I'm attaching 3 patches:
* The first patch pushes the CF deletion infos from the AbstractColumnContainer
to the ISortedColums implementation. Reason being that we will want that both
updates and deletes are atomic and isolated so we'll need to have those in the
same structure.
* The second patch modifies Memtable.apply() to use CF.addAll directly.
* The third patch introduces AtomicSortedColumns using snapTree and uses it
whenever thread-safety/isolation is needed. Note that it fully replace
ThreadSafeSortedColumns that is removed, and also that the patch tries to limit
the use of AtomicSortedColumns to concurrent context, making
TreeMapBackedSortedColumns the default for other non-concurrent context.
There is two gray areas with this patch that I know of:
* It would be easy to break isolation for super columns. If cf is an
AtomicSortedColumns backed (super) column family and you do a {{sc =
cf.getColumn(someSCname)}} and then do {{sc.addAll(cols)}}, then that last
operation won't be in isolation. I don't think we do that in any context where
it matters, but still something to be aware of.
* Iterator based removal is not thread-safe. Basically, if you do an iteration,
doing removes using the iterator remove() method and there is a concurrent
mutation on the cf, the remove may well just be ignored. I think the main
place where we do iterator based removes is during CFS.removeDeleted(). But
it's mostly done during queries/compaction so not in a concurrent context. We
do a removeDeleted on cachedRow sometimes during compaction but in that case it
won't be the end of the world if that remove is ignored because of a concurrent
mutation. Still, not very beautiful but I don't see a simple solution (outside
of not using iterator based removes that is).
Overall, I think the patch is ready for benchmarking (all unit tests are
passing). I did a very quick stress test on my localhost and I didn't see any
noticeable difference with or without the patch (neither writes nor reads).
But 1) that was not a very scientific benchmark and 2) it was a short
benchmark. I don't think raw performance will a problem with this patch, the
problem is that it generates more garbage, which itself may degrade performance
on the long run. That's probably what we'll want to benchmark.
> Add row-level isolation
> -----------------------
>
> Key: CASSANDRA-2893
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2893
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 1.1
>
> Attachments: 0001-Move-deletion-infos-into-ISortedColumns.patch,
> 0002-Make-memtable-use-CF.addAll.patch,
> 0003-Add-AtomicSortedColumn-and-snapTree.patch, snaptree-0.1-SNAPSHOT.jar
>
>
> This could be done using an the atomic ConcurrentMap operations from the
> Memtable and something like http://code.google.com/p/pcollections/ to replace
> the ConcurrentSkipListMap in ThreadSafeSortedColumns. The trick is that
> pcollections does not provide a SortedMap, so we probably need to write our
> own.
> Googling [persistent sortedmap] I found
> http://code.google.com/p/actord/source/browse/trunk/actord/src/main/scala/ff/collection
> (in scala) and http://clojure.org/data_structures#Data Structures-Maps.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira