I have the requirements to periodically run full tables scans on our data.
It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do
it in Java because I need something mildly trivial.

Pig / hadoop / etc are mildly overkill for this.  I don’t want or need a
whole hadoop or HDFS setup for this.

For example, a full table scan, and if a field matches a regex, set another
column based on that value.

Seems like this wouldn’t be too hard.  Just write a daemon that looks at
the key distribution and runs a scan on the data closest to it.  It would
be ideal if it was in a separate daemon so that you couldn’t accidentally
read all that data into memory and then OOM the Cassandra daemon.

Does this already exist?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to