Re: Checking replication status

Bryan Cheng Tue, 01 Mar 2016 11:21:46 -0800

HI Jeremy,

For more insight into the hint system, these two blog posts are great
resources: http://www.datastax.com/dev/blog/modern-hinted-handoff, and
http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery
.


For timeframes, that's going to differ based on your read/write patterns
and load. Although I haven't tried this before, I believe you can
query the system.hints
table to see the status of hints queued by the local machine.

--local and --dc are similar in the sense that they are always repairs
against the local datacenter, they just differ in syntax. If you sustain
loss of inter-dc connectivity for longer than max_hint_window_in_ms, you'll
want to run a cross-dc repair, which is just the standard full repair
(without specifying either).

On Mon, Feb 29, 2016 at 7:38 PM, Jimmy Lin <y2klyf+w...@gmail.com> wrote:

> hi Bryan,
> I guess I want to find out if there is any way to tell when data will
> become consistent again in both cases.
>
> if the node being down shorter than the max_hint_window(say 2 hours out of
> 3 hrs max), is there anyway to check the log or JMX etc to see if the hint
> queue size back to zero or lower range?
>
>
> if node goes down longer than max_hint_window time (say 4 hrs hours > our
> max 3 hrs), we run repair job. What is the correct nodetool repair job
> syntax to use?
> in particular what is the difference between -local vs -dc? they both
> seems to indicate repairing nodes within a datacenter, but for across DC
> network outage, we want to repair nodes across DCs right?
>
> thanks
>
>
>
> On Fri, Feb 26, 2016 at 3:38 PM, Bryan Cheng <br...@blockcypher.com>
> wrote:
>
>> Hi Jimmy,
>>
>> If you sustain a long downtime, repair is almost always the way to go.
>>
>> It seems like you're asking to what extent a cluster is able to
>> recover/resync a downed peer.
>>
>> A peer will not attempt to reacquire all the data it has missed while
>> being down. Recovery happens in a few ways:
>>
>> 1) Hints: Assuming that there are enough peers to satisfy your quorum
>> requirements on write, the live peers will queue up these operations for up
>> to max_hint_window_in_ms (from cassandra.yaml). These hints will be
>> delivered once the peer recovers.
>> 2) Read repair: There is a probability that read repair will happen,
>> meaning that a query will trigger data consistency checks and updates _on
>> the query being performed_.
>> 3) Repair.
>>
>> If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
>> _will_ have missing data. If you cannot tolerate this situation, you need
>> to take a look at your tunable consistency and/or trigger a repair.
>>
>> On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin <y2klyf+w...@gmail.com> wrote:
>>
>>> so far they are not long, just some config change and restart.
>>> if it is a 2 hrs downtime due to whatever reason, a repair is better
>>> option than trying to figure out if replication syn finish or not?
>>>
>>> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle <daeme...@gmail.com>
>>> wrote:
>>>
>>>> Hmm. What are your processes when a node comes back after "a long
>>>> offline"? Long enough to take the node offline and do a repair? Run the
>>>> risk of serving stale data? Parallel repairs? ???
>>>>
>>>> So, what sort of time frames are "a long time"?
>>>>
>>>>
>>>> *.......*
>>>>
>>>>
>>>>
>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin <y2k...@gmail.com> wrote:
>>>>
>>>>> hi all,
>>>>>
>>>>> what are the better ways to check replication overall status of cassandra 
>>>>> cluster?
>>>>>
>>>>>  within a single DC, unless a node is down for long time, most of the 
>>>>> time i feel it is pretty much non-issue and things are replicated pretty 
>>>>> fast. But when a node come back from a long offline, is there a way to 
>>>>> check that the node has finished its data sync with other nodes  ?
>>>>>
>>>>>  Now across DC, we have frequent VPN outage (sometime short sometims 
>>>>> long) between DCs, i also like to know if there is a way to find how the 
>>>>> replication progress between DC catching up under this condtion?
>>>>>
>>>>>  Also, if i understand correctly, the only gaurantee way to make sure 
>>>>> data are synced is to run a complete repair job,
>>>>> is that correct? I am trying to see if there is a way to "force a quick 
>>>>> replication sync" between DCs after vpn outage.
>>>>> Or maybe this is unnecessary, as Cassandra will catch up as fast as it 
>>>>> can, there is nothing else we/(system admin) can do to make it faster or 
>>>>> better?
>>>>>
>>>>>
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Checking replication status

Reply via email to