[ 
https://issues.apache.org/jira/browse/HDFS-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094792#comment-17094792
 ] 

Chen Liang commented on HDFS-15293:
-----------------------------------

[~shv] I don't think the the issue you mentioned will actually happen 
currently. Because the checks only skip an image if BOTH conditions are met: 1. 
time delta too small AND 2. txnid delta too small. It's an AND not OR.

So in the case you mentioned, it is true that time delta will always be 
considered too small due to the ridiculously large interval, but if configured 
with a small txnid, it is easy to get enough txnid, so txnid delta won't be 
considered too small. It is not that time delta being small alone leads to 
rejecting an image.

But indeed, it is possible that in a cluster with ridiculously large interval, 
plus a extremely light load (so txnid barely make progress), both conditions 
will always be true. In this case the checkpoint will all be rejected. Although 
realistically I don't think there is much value doing checkpoint in such 
situation any way, it is probably not a good idea to change behavior of the 
system by effectively rejecting all images from happening.

Because of this, I'm thinking of removing the txnid condition all together, so 
the check only looks at time delta and allow any txnid delta. It seems more 
tricky to justify preventing all the use cases with slow txnid increase. (Time 
always proceed, but not necessarily txnid.) I think we were targeting mainly 
time condition originally.

> Relax the condition for accepting a fsimage when receiving a checkpoint 
> ------------------------------------------------------------------------
>
>                 Key: HDFS-15293
>                 URL: https://issues.apache.org/jira/browse/HDFS-15293
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>              Labels: multi-sbnn
>
> HDFS-12979 introduced the logic that, if ANN sees consecutive fs image upload 
> from Standby with a small delta comparing to previous fsImage. ANN would 
> reject this image. This is to avoid overly frequent fsImage in case of when 
> there are multiple Standby node. However this check could be too stringent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to