I have the requirements to periodically run full tables scans on our data. It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do it in Java because I need something mildly trivial.
Pig / hadoop / etc are mildly overkill for this. I don’t want or need a whole hadoop or HDFS setup for this. For example, a full table scan, and if a field matches a regex, set another column based on that value. Seems like this wouldn’t be too hard. Just write a daemon that looks at the key distribution and runs a scan on the data closest to it. It would be ideal if it was in a separate daemon so that you couldn’t accidentally read all that data into memory and then OOM the Cassandra daemon. Does this already exist? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>