Yes, each CF has its own memtable. The writes are atomic in the sense
that I can still do an all-or-nothing write to multiple CFs (the
CommitLog still logs the whole row). Having multiple CFs with their
own memtable simply means that concurrent operations may not be
*isolated* from each other. So, if I have 2 operations:
Op1: Write(key1, CF1:col1=new, CF2:col2=new)
Op2: Read(key1, CF1:col1, CF2:col2)
Assuming both columns had old as the previous value, based on the
exec schedule Op2 could return one of:
old, old -- Op2 before Op1
old, new -- Op1 writes CF2, then Op2 gets scheduled
new, old -- Op1 writes CF1, then Op2 gets scheduled
new, new -- Op1 before Op2
But with time (eventually), re-execution of Op2 will always return the
last result.
I agree that this is of limited value right now, but atomicity without
isolation can still be useful. It'll save the app some cleanup and
book-keeping code.
On Wed, Apr 22, 2009 at 9:36 AM, Jonathan Ellis jbel...@gmail.com wrote:
On Wed, Apr 22, 2009 at 11:32 AM, Sandeep Tata sandeep.t...@gmail.com wrote:
Having multiple CFs in a row could be useful for writes. Consider the
case when you use one CF to store the data and another to store some
kind of secondary index on that data. It will be useful to apply
updates to both families atomically.
Except that's not how it works, each Memtable (CF) has its own
executor thread so even if you put multiple CFs in a Row it's not
going to be atomic with the current system, and it's a big enough
change to try to add some kind of coordination there that I don't
think it's worth it. (And you have seen that I am not scared of big
changes, so that should give you pause. :)
Back to YAGNI. :) Row doesn't fit in the current execution model, so
rather than leaving it as a half-baked creation, better to excise it
and if we ever decide to support atomic updates across CFs then that
would be the time to add it or something like it back.
-Jonathan