how would this be different then the byte[] column name you can already match on?
2010/2/1 Ted Zlatanov <t...@lifelogs.com>: > On Fri, 29 Jan 2010 15:07:01 -0600 Ted Zlatanov <t...@lifelogs.com> wrote: > > TZ> On Fri, 29 Jan 2010 12:06:28 -0600 Jonathan Ellis <jbel...@gmail.com> > wrote: > JE> On Fri, Jan 29, 2010 at 9:09 AM, Mehar Chaitanya > JE> <meharchaita...@gmail.com> wrote: >>>> 1. This would lead to enourmous amount of duplication of data, in short >>>> if I now want to view the data from IS_PUBLISHED dimenstion then my >>>> database >>>> size would scale up tremendously. > > JE> Yes. But disk space is so cheap it's worth using a lot of it to make > JE> other things fast. > > TZ> IIUC, Mehar would be duplicating the article data for every article tag. > > TZ> I searched the bug tracker and wiki and didn't find anything on the > TZ> topic of tag storage and search, so I don't think Cassandra supports > TZ> tags without data duplication. > > TZ> Would it be possible to implement an optional byte[] bitmap field in > TZ> SliceRange? If you can specify the bitmap as an optional field it would > TZ> not break current clients. Then the search can return only the subset > TZ> of the range that matches the bitmap. This would make sense for > TZ> BytesType and LongType, at least. > > I looked at the source code and it seems that > StorageProxy::getSliceRange() is the focal point for reads and bitmap > matching should be implemented there. The bitmap could be applied as a > filter before the other SliceRange parameters, especially the max number > of return results. It may be worth the effort to send the bitmap down > to the ReadCommand/ColumnFamily level to reduce the number of potential > matches. > > If this is not feasible for technical reasons I'd like to know. > Otherwise I'll put it on my TODO list and produce a proposal (unless > someone more knowledgeable is interested, of course). > > Ted > >