Row vs CF

2009-04-22 Thread Jonathan Ellis
In a bunch of places in the code we wrap a CF in a Row object,
basically a key + multiple CFs.  But currently only a single
ColumnFamily will ever be in a Row object.  (At least in the Rows
involved in a client read op.  Maybe Rows are used internally in other
places with multiple CFs.  But I am concerned with the read path
here.)

Is this an example where we should apply YAGNI?
(http://en.wikipedia.org/wiki/You_Ain%27t_Gonna_Need_It)  It seems to
me that if the definition of a CF is, this is data that is logically
or otherwise related then adding an API to request multiple CFs at
once is unnecessary.  (If you really need data from multiple CFs
frequently, your data model is broken and you should combine the CFs;
if you need it infrequently, the overhead from doing multiple queries
is not a big deal.)

Thoughts?

-Jonathan


Re: Row vs CF

2009-04-22 Thread Sandeep Tata
Yes, each CF has its own memtable. The writes are atomic in the sense
that I can still do an all-or-nothing write to multiple CFs (the
CommitLog still logs the whole row). Having multiple CFs with their
own memtable simply means that concurrent operations may not be
*isolated* from each other. So, if I have 2 operations:

Op1: Write(key1, CF1:col1=new, CF2:col2=new)
Op2: Read(key1, CF1:col1, CF2:col2)

Assuming both columns had old as the previous value, based on the
exec schedule Op2 could return one of:

old, old  -- Op2 before Op1
old, new -- Op1 writes CF2, then Op2 gets scheduled
new, old -- Op1 writes CF1, then Op2 gets scheduled
new, new -- Op1 before Op2

But with time (eventually), re-execution of Op2 will always return the
last result.

I agree that this is of limited value right now, but atomicity without
isolation can still be useful. It'll save the app some cleanup and
book-keeping code.



On Wed, Apr 22, 2009 at 9:36 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Wed, Apr 22, 2009 at 11:32 AM, Sandeep Tata sandeep.t...@gmail.com wrote:
 Having multiple CFs in a row could be useful for writes. Consider the
 case when you use one CF to store the data and another to store some
 kind of secondary index on that data. It will be useful to apply
 updates to both families atomically.

 Except that's not how it works, each Memtable (CF) has its own
 executor thread so even if you put multiple CFs in a Row it's not
 going to be atomic with the current system, and it's a big enough
 change to try to add some kind of coordination there that I don't
 think it's worth it.  (And you have seen that I am not scared of big
 changes, so that should give you pause. :)

 Back to YAGNI. :)  Row doesn't fit in the current execution model, so
 rather than leaving it as a half-baked creation, better to excise it
 and if we ever decide to support atomic updates across CFs then that
 would be the time to add it or something like it back.

 -Jonathan