Hi, I’m using Cassandra to store NLP data, the dataset is not that huge (about 1TB), but I need to iterate over it quite frequently, updating the full dataset (each record, but not necessarily each column).
I’ve run into two problems (I’m using the latest Cassandra): 1. I was trying to copy from one Cassandra cluster to another via a python driver, however the driver confused the two instances 2. While trying to update the full dataset with a simple transformation (again via python driver), single node and clustered Cassandra run out of memory no matter what settings I try, even I put a lot of sleeps into the mix. However simpler transformations (updating just one column, specially when there is a lot of processing overhead) work just fine. I’m really concerned about #2, since we’re moving all heavy processing to a Spark cluster and will expand it, and I would expect much heavier traffic to/from Cassandra. Any hints, war stories, etc. very appreciated! Thank you, Pavel Velikhov