Thanks for the specific suggestions Jonathan, I really appreciate it.

On Tue, Feb 9, 2010 at 9:37 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwin...@gmail.com> wrote:
>> Somehow I need to partition the data better.  Would a recommendation
>> be to “split” the “sex” key into multiple keys? For example I could
>> append the year and month to the key (“sex_022010”) to partition the
>> data by the month it was insert.
>
> That's one possibility.  Another would be to kill two birds with one
> stone and add the age to that key, so you'd have male_20 (probably
> better: male_1990), etc.
>
> Fundamentally TANSTAAFL and if you need to scale queries w/ lots of
> criteria like this you will have to choose (sometimes from more than
> one of) these options:
>
>  - have a lot of machines so you can parallelize brute force queries,
> e.g. w/ Hadoop
>  - precompute specific "indexes" like sex_birthdate above
>   - note, with supercolumns you can also materialize the whole
> "person" in subcolumns, rather than doing an extra lookup for each
> index hit
>  - use less-specific indexes (e.g. separate sex & birthdate indexes to
> continue the example) and do more work on the client
>
> -Jonathan
>

Reply via email to