Hi Jared. you might want to look at graph databases (hypergraphDB or neo4j for example) for use cases like this. what it seems like you are asking for is a semantic knowledge base ala freebase.com
tools like protégé (protege.stanford.edu/ ) and gremlin (gremlin.tinkerpop.com) are helpful for this kind of thing as well. the other issue you are going to encounter is when you want to link up 2 things. for example marriage. find all people whose sex == ‘male’ and age >= 20 and age <= 29 and is married to people called michelle who is older than 27. HTH Ian On Feb 10, 2010, at 3:51 AM, Jared winick wrote: > Thanks for the specific suggestions Jonathan, I really appreciate it. > > On Tue, Feb 9, 2010 at 9:37 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwin...@gmail.com> wrote: >>> Somehow I need to partition the data better. Would a recommendation >>> be to “split” the “sex” key into multiple keys? For example I could >>> append the year and month to the key (“sex_022010”) to partition the >>> data by the month it was insert. >> >> That's one possibility. Another would be to kill two birds with one >> stone and add the age to that key, so you'd have male_20 (probably >> better: male_1990), etc. >> >> Fundamentally TANSTAAFL and if you need to scale queries w/ lots of >> criteria like this you will have to choose (sometimes from more than >> one of) these options: >> >> - have a lot of machines so you can parallelize brute force queries, >> e.g. w/ Hadoop >> - precompute specific "indexes" like sex_birthdate above >> - note, with supercolumns you can also materialize the whole >> "person" in subcolumns, rather than doing an extra lookup for each >> index hit >> - use less-specific indexes (e.g. separate sex & birthdate indexes to >> continue the example) and do more work on the client >> >> -Jonathan >> -- Ian Holsman i...@holsman.net