Hi Jimmy, If you sustain a long downtime, repair is almost always the way to go.
It seems like you're asking to what extent a cluster is able to recover/resync a downed peer. A peer will not attempt to reacquire all the data it has missed while being down. Recovery happens in a few ways: 1) Hints: Assuming that there are enough peers to satisfy your quorum requirements on write, the live peers will queue up these operations for up to max_hint_window_in_ms (from cassandra.yaml). These hints will be delivered once the peer recovers. 2) Read repair: There is a probability that read repair will happen, meaning that a query will trigger data consistency checks and updates _on the query being performed_. 3) Repair. If a machine goes down for longer than max_hint_window_in_ms, AFAIK you _will_ have missing data. If you cannot tolerate this situation, you need to take a look at your tunable consistency and/or trigger a repair. On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin <y2klyf+w...@gmail.com> wrote: > so far they are not long, just some config change and restart. > if it is a 2 hrs downtime due to whatever reason, a repair is better > option than trying to figure out if replication syn finish or not? > > On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle <daeme...@gmail.com> > wrote: > >> Hmm. What are your processes when a node comes back after "a long >> offline"? Long enough to take the node offline and do a repair? Run the >> risk of serving stale data? Parallel repairs? ??? >> >> So, what sort of time frames are "a long time"? >> >> >> *.......* >> >> >> >> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 >> <%28%2B44%29%20%280%29%2020%208144%209872>* >> >> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin <y2k...@gmail.com> wrote: >> >>> hi all, >>> >>> what are the better ways to check replication overall status of cassandra >>> cluster? >>> >>> within a single DC, unless a node is down for long time, most of the time >>> i feel it is pretty much non-issue and things are replicated pretty fast. >>> But when a node come back from a long offline, is there a way to check that >>> the node has finished its data sync with other nodes ? >>> >>> Now across DC, we have frequent VPN outage (sometime short sometims long) >>> between DCs, i also like to know if there is a way to find how the >>> replication progress between DC catching up under this condtion? >>> >>> Also, if i understand correctly, the only gaurantee way to make sure data >>> are synced is to run a complete repair job, >>> is that correct? I am trying to see if there is a way to "force a quick >>> replication sync" between DCs after vpn outage. >>> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, >>> there is nothing else we/(system admin) can do to make it faster or better? >>> >>> >>> >>> Sent from my iPhone >>> >> >> >