Good afternoon,

I'm making my data model from scratch for cassandra, this means i can tune
and fine tune it for performance.

At this time i'm having problems choosing between a 2 column families or 1
super column family. I will illustrate with a example.

Sector, this defines a place, this is one or two properties.
Entry, a entry that is bound to a sector, this is simply some text and a few
properties.

I can model this with a super column family:

sectors{ //super column family
sector1{
uid1{
text: a text
user: joop
}
uid2{
text: more text
user: piet
}
}
sector2{
uid10{
text: even more text
user: marie
}
}
}

But i can also model this with 2 column families:

sectors{ // column family
sector1{
textid1: null
textid2: null
}
sector2{
textid4: null
}
}

texts{ //column family
textid1{
text: a text
user: joop
}
textid2{
text: more text
user: piet
}
}

With the super column family i can retrieve a list of texts for a specific
sector with only 1 request to cassandra.

With the 2 column families i need to send 2 requests to cassandra:
1. give me all textids from sector x. (returns x, y, z)
2. give me all texts that have id x, y, z.

In my final application it is likely that there will be a bit more writes
compared to reads.

I was wondering what the best approach is when it comes to performance. I
suspect that using super column families is slower compared the using column
families, but is it stil slower when using 2 column families and with 2
request to cassandra instead of 1 (with super column family).

Kind regards,
T. Akhayo

Reply via email to