Memory footprint of Calliope: Spark -> Cassandra writes

Gerard Maas Mon, 16 Jun 2014 09:29:33 -0700

Hi,

I've been doing some testing with Calliope as a way to do batch load from
Spark into Cassandra.
My initial results are promising on the performance area, but worrisome on
the memory footprint side.


I'm generating N records of about 50 bytes each and using the UPDATE
mutator to insert them into C*.   I get OOM if my memory is below 1GB per
million of records, or about 50Mb of raw data (without counting any
RDD/structural overhead).  (See code [1])

(so, to avoid confusions: e.g.: I need 4GB RAM to save  4M of 50Byte
records to Cassandra)  That's an order of magnitude more than the RAW data.

I understood that Calliope builds on top of the Hadoop support of
Cassandra, which builds on top of SSTables and sstableloader.

I would like to know what's the memory usage factor of Calliope and what
parameters could I use to control/tune that.

Any experience/advice on that?

-kr, Gerard.

[1] https://gist.github.com/maasg/68de6016bffe5e71b78c

Memory footprint of Calliope: Spark -> Cassandra writes

Reply via email to