On Mon, Feb 4, 2013 at 2:11 PM, William Katsak <wkat...@cs.rutgers.edu> wrote: > Hello, > > I am working on some modifications of Cassandra in an academic setting > (research code, not for a class), and have a question regarding bulk > streaming of data across the network (e.g. between nodes). > > Assume that I have some known set of key/column family combos that are known > good/current on a node A, and known stale on a node B (forget about hinted > handoff, etc. assume that this mechanism isn't being used). I can obviously > bring these up to date on B by using anti-entropy repair, but this checks > all the data and is CPU/time intensive. I have written code that brings this > data up to date using the same mechanism as read repair (e.g. an item at a > time), and this works fine, but is inefficient. > > What I am interested in doing is something in between. I want to bulk stream > a series of updates between nodes like anti-entropy does, but I want the > data that is sent to only be part of the specific itemized set that I am > interested in.
If all you want is a ks/cf-specific version of 'nodetool rebuild' then that is a good place to start. -Brandon