Scalable data model for a Metadata database

Jared winick Tue, 09 Feb 2010 08:02:26 -0800

Hi All-

I am new to Cassandra and evaluating it as part of a trade study of
NOSQL solutions. I have done some prototyping and reading of the
mailing list and blog posts, but was hoping to get some basic
suggestions about designing a scalable data model. My example use case
is storing and querying metadata about Person objects.  A Person
object would have a flexible set of metadata such as sex, firstName,
lastName, age, location, etc. I need to be able to store billions of
Person objects. I also need to be able to execute queries such as
“Find all people whose sex == ‘male’ and age >= 20 and age <= 29.”  My
naïve first attempt had 2 ColumnFamilies:


ColumnFamily: Person
Key: Person ID (a unique ID for each Person object)
Columns: The metadata name and value for that Person.
Example:
Person : {
  000-00-0000 : {
    sex : male,
    firstName : Jared
    …
  }
}

ColumnFamily: Metadata
Key: Metadata Name
SuperColumns: Metadata Values
SubColumns: Person ID with that metadata value
Example:
Metadata : {
  sex : {
    male : {
      000-00-0000 : 000-00-0000,
      000-00-0001 : 000-00-0000,
      …
    }
    female : { … }
  }
  firstName : { …. }
}
        
This model works great for small datasets, but I am concerned that it
has big issues as the number of records grows. For example, in the
Metadata ColumnFamily, for key “sex” and SuperColumn “male”, the
number of subcolumns is going to get huge, ~50% of the number of
records in the database.

Somehow I need to partition the data better.  Would a recommendation
be to “split” the “sex” key into multiple keys? For example I could
append the year and month to the key (“sex_022010”) to partition the
data by the month it was insert. Any other basic recommendations that
those of you with experience have? Thanks a lot for any suggestions
you have.

Jared Winick

Scalable data model for a Metadata database

Reply via email to