On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwin...@gmail.com> wrote: > Somehow I need to partition the data better. Would a recommendation > be to “split” the “sex” key into multiple keys? For example I could > append the year and month to the key (“sex_022010”) to partition the > data by the month it was insert.
That's one possibility. Another would be to kill two birds with one stone and add the age to that key, so you'd have male_20 (probably better: male_1990), etc. Fundamentally TANSTAAFL and if you need to scale queries w/ lots of criteria like this you will have to choose (sometimes from more than one of) these options: - have a lot of machines so you can parallelize brute force queries, e.g. w/ Hadoop - precompute specific "indexes" like sex_birthdate above - note, with supercolumns you can also materialize the whole "person" in subcolumns, rather than doing an extra lookup for each index hit - use less-specific indexes (e.g. separate sex & birthdate indexes to continue the example) and do more work on the client -Jonathan