Re: Scalable data model for a Metadata database

Jonathan Ellis Tue, 09 Feb 2010 08:37:53 -0800

On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwin...@gmail.com> wrote:
> Somehow I need to partition the data better.  Would a recommendation
> be to “split” the “sex” key into multiple keys? For example I could
> append the year and month to the key (“sex_022010”) to partition the
> data by the month it was insert.


That's one possibility.  Another would be to kill two birds with one
stone and add the age to that key, so you'd have male_20 (probably
better: male_1990), etc.

Fundamentally TANSTAAFL and if you need to scale queries w/ lots of
criteria like this you will have to choose (sometimes from more than
one of) these options:

 - have a lot of machines so you can parallelize brute force queries,
e.g. w/ Hadoop
 - precompute specific "indexes" like sex_birthdate above
   - note, with supercolumns you can also materialize the whole
"person" in subcolumns, rather than doing an extra lookup for each
index hit
 - use less-specific indexes (e.g. separate sex & birthdate indexes to
continue the example) and do more work on the client

-Jonathan

Re: Scalable data model for a Metadata database

Reply via email to