On Mon, Feb 22, 2010 at 1:40 PM, Sonny Heer <sonnyh...@gmail.com> wrote:

> Hey,
>
> We are in the process of implementing a cassandra application service.
>
> we have already ingested TB of data using the cassandra bulk loader
> (StorageService).
>
> One of the requirements is to get a data explosion factor as a result of
> denormalization.  Since the writes are going to the memory tables, I'm not
> sure how I could grab stats.  I cant get size of data before ingest since
> some of the data may be duplicated.
>

Are you talking about duplication across nodes due to the replication
factor, or because some rows may still be in the memtable?

I think what you want to do is bin/nodeprobe flush, bin/nodeprobe compact,
wait until the system is idle and then sum the size of everything in your
data paths that starts with the name of your column family.

Also a general problem we are running into is an easy way to do paging over
> the data set (not just rows but columns).  Looks like now the API has ways
> to do count, but no offset.
>

Columns can easily be paginated via the 'start' and 'finish' parameters.
 You can't jump to a random page, but you can provide next/previous
behavior.

-Brandon

Reply via email to