On Thu, Nov 26, 2009 at 7:12 PM, Anthony Molinaro
<[email protected]> wrote:
> Unless you are using order preserving partitioning which might or might not
> be what you want, you won't be able to do a full scan. Instead you should
> probably have two column families, one keyed by primary, one by secondary,
> each with a column for the other, then you can do you operations. It
> uses more space, but disk is cheap so probably not a big deal.
yes, we thought so, using the second column family to only keep a list
of the keys in the former without the data.
> If you
> have to model a many-to-many relationship you can use super columns.
For now we are only storing a single attribute data, so we used normal
columns instead of super columns, so in the end our schema is
PrimaryCF
{ 'primary' => {'secondary'=>'data_0'} }
SecondaryCF
{'secondary'={'primary'=>''}
I believe that using a SuperColumn in PrimaryCF would be necessary
only when using more than one attribute, or are there other
implications I'm not seeing?
As for the secondary, I don't like the idea of storing a dummy value
(new byte[0]) when I only need the name, is that a smell that I should
be using something else?
<snip>
> You do your inserts into both, and for deletes you do a get_slice for the
> secondary id, which will give you all primary ids which contain the
> secondary id. Then you can delete everything.
yes, we actually did it a bit "smarter" by querying first, and keeping
a list of only the diff between the first and second insert. Thanks a
lot for your answer, it's been very useful.