: Cassandra Database using too much space
Hi Jack ,
Thanks for replying.
Here what I meant by 1.5M words is not 1.5 Distincts words, it is the count of
all words we added to the corpus (total word instances). Then in word_frequency
and word_ordered_frequency CFs, we have a row for each distinct word
Well your data model looks fine at a glance, a lot of tables, but they
appear to be mapping to logically obvious query paths. This denormalization
will make your queries fast but eat up more disk, and if disk is really a
pain point, Id suggest looking at your economics a bit, and look at your
Hi Ryan,
Thank you very much. This helps a lot.
On Sun, Dec 14, 2014 at 9:14 PM, Ryan Svihla rsvi...@datastax.com wrote:
Well your data model looks fine at a glance, a lot of tables, but they
appear to be mapping to logically obvious query paths. This denormalization
will make your queries
It looks like you will have quite a few “combinatoric explosions” to cope with.
In addition to 1.5M words, you have bigrams – combinations of two and three
words. You need to get a handle on the cardinality of each of your tables.
Bigrams and trigrams could give you who knows how many millions
Hi Jack ,
Thanks for replying.
Here what I meant by 1.5M words is not 1.5 Distincts words, it is the count
of all words we added to the corpus (total word instances). Then in
word_frequency and word_ordered_frequency CFs, we have a row for each
distinct word with its frequency (two CFs have same