Hi Jimmy,

If you sustain a long downtime, repair is almost always the way to go.

It seems like you're asking to what extent a cluster is able to
recover/resync a downed peer.

A peer will not attempt to reacquire all the data it has missed while being
down. Recovery happens in a few ways:

1) Hints: Assuming that there are enough peers to satisfy your quorum
requirements on write, the live peers will queue up these operations for up
to max_hint_window_in_ms (from cassandra.yaml). These hints will be
delivered once the peer recovers.
2) Read repair: There is a probability that read repair will happen,
meaning that a query will trigger data consistency checks and updates _on
the query being performed_.
3) Repair.

If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
_will_ have missing data. If you cannot tolerate this situation, you need
to take a look at your tunable consistency and/or trigger a repair.

On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin <y2klyf+w...@gmail.com> wrote:

> so far they are not long, just some config change and restart.
> if it is a 2 hrs downtime due to whatever reason, a repair is better
> option than trying to figure out if replication syn finish or not?
>
> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> Hmm. What are your processes when a node comes back after "a long
>> offline"? Long enough to take the node offline and do a repair? Run the
>> risk of serving stale data? Parallel repairs? ???
>>
>> So, what sort of time frames are "a long time"?
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin <y2k...@gmail.com> wrote:
>>
>>> hi all,
>>>
>>> what are the better ways to check replication overall status of cassandra 
>>> cluster?
>>>
>>>  within a single DC, unless a node is down for long time, most of the time 
>>> i feel it is pretty much non-issue and things are replicated pretty fast. 
>>> But when a node come back from a long offline, is there a way to check that 
>>> the node has finished its data sync with other nodes  ?
>>>
>>>  Now across DC, we have frequent VPN outage (sometime short sometims long) 
>>> between DCs, i also like to know if there is a way to find how the 
>>> replication progress between DC catching up under this condtion?
>>>
>>>  Also, if i understand correctly, the only gaurantee way to make sure data 
>>> are synced is to run a complete repair job,
>>> is that correct? I am trying to see if there is a way to "force a quick 
>>> replication sync" between DCs after vpn outage.
>>> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, 
>>> there is nothing else we/(system admin) can do to make it faster or better?
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>
>>
>

Reply via email to