Well your data model looks fine at a glance, a lot of tables, but they
appear to be mapping to logically obvious query paths. This denormalization
will make your queries fast but eat up more disk, and if disk is really a
pain point, Id suggest looking at your economics a bit, and look at your
tradeoffs.


   1. If you want less disk usage, and can afford to have longer query
   times, switch from denormalized views and use indexes instead, you'll get
   better disk space savings, at the cost of more round trips on a read (read
   index value..get partition key, do another read).
   2. If you really need queries to be as fast as possible, then you're on
   the right path, but you'll have to realize this is the cost of scale. With
   even relational databases in the past I've had to use a similar strategy to
   speed up lookups (less different query parameters in that case and more
   queries that would normally require lots of joins).

Hope this helps explain tradeoffs and costs.

On Sun, Dec 14, 2014 at 6:01 AM, Chamila Wijayarathna <
cdwijayarat...@gmail.com> wrote:
>
> Hello all,
>
> We are trying to develop a language corpus by using Cassandra as its
> storage medium.
>
> https://gist.github.com/cdwijayarathna/7550176443ad2229fae0 shows the
> types of information we need to extract from corpus interface.
> So we designed schema at
> https://gist.github.com/cdwijayarathna/6491122063152669839f to use as the
> database. Out target is to develop corpus with 100+ million words.
>
> By now we have inserted about 1.5 million words and database has used
> about 14GB space. Is this a normal scenario or are we doing anything wrong?
> Is there any issue in our data model?
>
> Thank You!
> --
> *Chamila Dilshan Wijayarathna,*
> SMIEEE, SMIESL,
> Undergraduate,
> Department of Computer Science and Engineering,
> University of Moratuwa.
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Reply via email to