Thanks for the specific suggestions Jonathan, I really appreciate it.
On Tue, Feb 9, 2010 at 9:37 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwin...@gmail.com> wrote: >> Somehow I need to partition the data better. Would a recommendation >> be to “split” the “sex” key into multiple keys? For example I could >> append the year and month to the key (“sex_022010”) to partition the >> data by the month it was insert. > > That's one possibility. Another would be to kill two birds with one > stone and add the age to that key, so you'd have male_20 (probably > better: male_1990), etc. > > Fundamentally TANSTAAFL and if you need to scale queries w/ lots of > criteria like this you will have to choose (sometimes from more than > one of) these options: > > - have a lot of machines so you can parallelize brute force queries, > e.g. w/ Hadoop > - precompute specific "indexes" like sex_birthdate above > - note, with supercolumns you can also materialize the whole > "person" in subcolumns, rather than doing an extra lookup for each > index hit > - use less-specific indexes (e.g. separate sex & birthdate indexes to > continue the example) and do more work on the client > > -Jonathan >