On Fri, 29 Jan 2010 15:07:01 -0600 Ted Zlatanov <t...@lifelogs.com> wrote:
TZ> On Fri, 29 Jan 2010 12:06:28 -0600 Jonathan Ellis <jbel...@gmail.com> wrote: JE> On Fri, Jan 29, 2010 at 9:09 AM, Mehar Chaitanya JE> <meharchaita...@gmail.com> wrote: >>> 1. This would lead to enourmous amount of duplication of data, in short >>> if I now want to view the data from IS_PUBLISHED dimenstion then my >>> database >>> size would scale up tremendously. JE> Yes. But disk space is so cheap it's worth using a lot of it to make JE> other things fast. TZ> IIUC, Mehar would be duplicating the article data for every article tag. TZ> I searched the bug tracker and wiki and didn't find anything on the TZ> topic of tag storage and search, so I don't think Cassandra supports TZ> tags without data duplication. TZ> Would it be possible to implement an optional byte[] bitmap field in TZ> SliceRange? If you can specify the bitmap as an optional field it would TZ> not break current clients. Then the search can return only the subset TZ> of the range that matches the bitmap. This would make sense for TZ> BytesType and LongType, at least. I looked at the source code and it seems that StorageProxy::getSliceRange() is the focal point for reads and bitmap matching should be implemented there. The bitmap could be applied as a filter before the other SliceRange parameters, especially the max number of return results. It may be worth the effort to send the bitmap down to the ReadCommand/ColumnFamily level to reduce the number of potential matches. If this is not feasible for technical reasons I'd like to know. Otherwise I'll put it on my TODO list and produce a proposal (unless someone more knowledgeable is interested, of course). Ted