Quick question about Facebook's indexing strategy... based on the fact that all of the columns within a supercolumn must be serialized/deserialized together, and therefore fit in memory, is there a point at which individual Facebook users could start causing problems if they have a lot of messages? Below is copied from section 6.1 of the lakshman-ladis2009 paper.
"There are two kinds of search features that are enabled today (a) term search (b) interactions - given the name of a person return all messages that the user might have ever sent or received from that person. The schema consists of two column families. For query (a) the user id is the key and the words that make up the message become the super column. Individual message identi ers of the messages that contain the word become the columns within the super column. For query (b) again the user id is the key and the recipients id's are the super columns. For each of these super columns the individual message identi- ers are the columns." If the user sent 10,000 messages to another user over a few years, wouldn't they have 10,000 message id's in a supercolumn? I guess that's only about 80kB, but certainly if they weren't partitioning by user, they would run into problems, so it may not be a good example for large non-partitioned indexes. Or maybe i still don't understand how supercolumns work. Matt On Wed, Feb 24, 2010 at 7:52 PM, Nathan McCall <n...@vervewireless.com>wrote: > The following paper on the Articles and Presentations section of the > Cassandra wiki describes Facebook's inbox search implementation: > http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf > > -Nate > > On Wed, Feb 24, 2010 at 4:45 PM, Mohammad Abed <mohammad.a...@gmail.com> > wrote: > > Either of these solutions used in any production environment? > > > > > > > > On Wed, Feb 24, 2010 at 3:54 PM, Brandon Williams <dri...@gmail.com> > wrote: > >> > >> On Wed, Feb 24, 2010 at 5:41 PM, Mohammad Abed <mohammad.a...@gmail.com > > > >> wrote: > >>> > >>> Any suggestions on how to pursue full text search with Cassandra, what > >>> options are out there? > >> > >> Also: http://github.com/tjake/Lucandra > >> -Brandon > > >