On Fri, 29 Jan 2010 15:07:01 -0600 Ted Zlatanov <t...@lifelogs.com> wrote: 

TZ> On Fri, 29 Jan 2010 12:06:28 -0600 Jonathan Ellis <jbel...@gmail.com> 
wrote: 
JE> On Fri, Jan 29, 2010 at 9:09 AM, Mehar Chaitanya
JE> <meharchaita...@gmail.com> wrote:
>>>   1. This would lead to enourmous amount of duplication of data, in short
>>>   if I now want to view the data from IS_PUBLISHED dimenstion then my 
>>> database
>>>   size would scale up tremendously.

JE> Yes.  But disk space is so cheap it's worth using a lot of it to make
JE> other things fast.

TZ> IIUC, Mehar would be duplicating the article data for every article tag.

TZ> I searched the bug tracker and wiki and didn't find anything on the
TZ> topic of tag storage and search, so I don't think Cassandra supports
TZ> tags without data duplication.

TZ> Would it be possible to implement an optional byte[] bitmap field in
TZ> SliceRange?  If you can specify the bitmap as an optional field it would
TZ> not break current clients.  Then the search can return only the subset
TZ> of the range that matches the bitmap.  This would make sense for
TZ> BytesType and LongType, at least.

I looked at the source code and it seems that
StorageProxy::getSliceRange() is the focal point for reads and bitmap
matching should be implemented there.  The bitmap could be applied as a
filter before the other SliceRange parameters, especially the max number
of return results.  It may be worth the effort to send the bitmap down
to the ReadCommand/ColumnFamily level to reduce the number of potential
matches.

If this is not feasible for technical reasons I'd like to know.
Otherwise I'll put it on my TODO list and produce a proposal (unless
someone more knowledgeable is interested, of course).

Ted

Reply via email to