Hi Jared.
you might want to look at graph databases (hypergraphDB or neo4j for example) 
for use cases like this. 
what it seems like you are asking for is a semantic knowledge base ala 
freebase.com

tools like protégé (protege.stanford.edu/ ) and gremlin (gremlin.tinkerpop.com) 
are helpful for this kind of thing as well.

the other issue you are going to encounter is when you want to link up 2 things.

for example marriage.
find all people whose sex == ‘male’ and age >= 20 and age <= 29 and is married 
to people called michelle who is older than 27.

HTH
Ian

On Feb 10, 2010, at 3:51 AM, Jared winick wrote:

> Thanks for the specific suggestions Jonathan, I really appreciate it.
> 
> On Tue, Feb 9, 2010 at 9:37 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwin...@gmail.com> wrote:
>>> Somehow I need to partition the data better.  Would a recommendation
>>> be to “split” the “sex” key into multiple keys? For example I could
>>> append the year and month to the key (“sex_022010”) to partition the
>>> data by the month it was insert.
>> 
>> That's one possibility.  Another would be to kill two birds with one
>> stone and add the age to that key, so you'd have male_20 (probably
>> better: male_1990), etc.
>> 
>> Fundamentally TANSTAAFL and if you need to scale queries w/ lots of
>> criteria like this you will have to choose (sometimes from more than
>> one of) these options:
>> 
>>  - have a lot of machines so you can parallelize brute force queries,
>> e.g. w/ Hadoop
>>  - precompute specific "indexes" like sex_birthdate above
>>   - note, with supercolumns you can also materialize the whole
>> "person" in subcolumns, rather than doing an extra lookup for each
>> index hit
>>  - use less-specific indexes (e.g. separate sex & birthdate indexes to
>> continue the example) and do more work on the client
>> 
>> -Jonathan
>> 

--
Ian Holsman
i...@holsman.net



Reply via email to