Re: Design Pattern - Tag Cloud / Inverted Index

Mark Robson Sun, 27 Dec 2009 08:39:23 -0800

2009/12/27 August Zajonc <augu...@augustz.com>

> Looking at the data model a simple solution is two column families,
> one containing items as the row-key with tags as columns, and a second
> with tags as the row-key with items as columns. This gives me fast
> access at the cost of 2x the writes (cheap) and storage (also cheap).
> So not bad.
>


I think this is the normal model.

However, there is no need to put them in separate column-families, you could
simply use non-overlapping keys.

There is however, a scalability problem when you have a single tag with a
very large number of items, or vice versa, that you will have a lot of
columns in a single CF / key. As this needs to be held in the ram of a node
during a query (and possibly other operations), it will blow the memory
usage up.

I guess the solution may be to create a number of different keys for the
same tag.

In any case, querying a very large number of items is problematic - the user
will not usually want them all, so you'd need to prioritise them somehow
anyway, so it might be sufficient to only store the "highest priority" items
against a single tag key (and have other keys for the lower priority ones).
How you define priority is application-specific.

Mark

Re: Design Pattern - Tag Cloud / Inverted Index

Reply via email to