Hi, I've been doing some testing with Calliope as a way to do batch load from Spark into Cassandra. My initial results are promising on the performance area, but worrisome on the memory footprint side.
I'm generating N records of about 50 bytes each and using the UPDATE mutator to insert them into C*. I get OOM if my memory is below 1GB per million of records, or about 50Mb of raw data (without counting any RDD/structural overhead). (See code [1]) (so, to avoid confusions: e.g.: I need 4GB RAM to save 4M of 50Byte records to Cassandra) That's an order of magnitude more than the RAW data. I understood that Calliope builds on top of the Hadoop support of Cassandra, which builds on top of SSTables and sstableloader. I would like to know what's the memory usage factor of Calliope and what parameters could I use to control/tune that. Any experience/advice on that? -kr, Gerard. [1] https://gist.github.com/maasg/68de6016bffe5e71b78c