2009/12/27 August Zajonc <augu...@augustz.com> > Looking at the data model a simple solution is two column families, > one containing items as the row-key with tags as columns, and a second > with tags as the row-key with items as columns. This gives me fast > access at the cost of 2x the writes (cheap) and storage (also cheap). > So not bad. >
I think this is the normal model. However, there is no need to put them in separate column-families, you could simply use non-overlapping keys. There is however, a scalability problem when you have a single tag with a very large number of items, or vice versa, that you will have a lot of columns in a single CF / key. As this needs to be held in the ram of a node during a query (and possibly other operations), it will blow the memory usage up. I guess the solution may be to create a number of different keys for the same tag. In any case, querying a very large number of items is problematic - the user will not usually want them all, so you'd need to prioritise them somehow anyway, so it might be sufficient to only store the "highest priority" items against a single tag key (and have other keys for the lower priority ones). How you define priority is application-specific. Mark