Hello,

I am working on some modifications of Cassandra in an academic setting (research code, not for a class), and have a question regarding bulk streaming of data across the network (e.g. between nodes).

Assume that I have some known set of key/column family combos that are known good/current on a node A, and known stale on a node B (forget about hinted handoff, etc. assume that this mechanism isn't being used). I can obviously bring these up to date on B by using anti-entropy repair, but this checks all the data and is CPU/time intensive. I have written code that brings this data up to date using the same mechanism as read repair (e.g. an item at a time), and this works fine, but is inefficient.

What I am interested in doing is something in between. I want to bulk stream a series of updates between nodes like anti-entropy does, but I want the data that is sent to only be part of the specific itemized set that I am interested in.

Is this something that is possible to do with the current code that exists, assuming that I already have code that keeps track of this set of stale data?

Advice is much appreciated.

Sincerely,
Bill Katsak
Rutgers University

Reply via email to