[ 
https://issues.apache.org/jira/browse/HDFS-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088924#comment-17088924
 ] 

Chen Liang edited comment on HDFS-15293 at 4/21/20, 6:09 PM:
-------------------------------------------------------------

The reason this check might be too stringent is that, for example, say we 
configure fsImage interval to 6 hours, consider the case when SBN uploads image 
A at time 00:00, but there is a minor time skew when ANN actually sees this 
fsImage, so ANN sees actually at 00:00.010. When next time SBN uploads next 
image at 06:00. And ANN sees this one with a smaller skew at 00:00.005. Then 
ANN would consider the time delta is smaller than the configured delta of 6 
hours and thus ANN would then reject this image. Despite that there is only a 
5ms difference, and should acceptable. Essentially, the current check for exact 
timestamp can be too susceptible to random timing conditions.

The consequence of this issue, is that ANN might be missing one image once in a 
while. Because even if ANN rejects the image at 06:00, next time SBN uploads at 
12:00, ANN will not reject it, as by that time, the delta is guaranteed to be > 
6 hours. This means there will not be more than one consecutive missing images.


was (Author: vagarychen):
The reason this check might be too stringent is that, for example, say we 
configure fsImage interval to 6 hours, consider the case when SBN uploads image 
A at time 00:00, but there is a minor time skew when ANN actually sees this 
fsImage, so ANN sees actually at 00:00.010. When next time SBN uploads next 
image at 06:00. And ANN sees this one with a smaller skew at 00:00.005. Then 
ANN would consider the time delta is smaller than the configured delta of 6 
hours and thus ANN would then reject this image. Despite that there is only a 
5ms difference, and should acceptable. Essentially, the current check for exact 
timestamp can be too susceptible to random timing conditions.

The consequence of this issue, is that ANN might be missing one image once in a 
while. Because even if ANN rejects the image at 06:00, next time SBN uploads at 
12:00, ANN will not reject it. So there will not be more than one consecutive 
missing images.

> Relax FSImage upload time delta check restriction
> -------------------------------------------------
>
>                 Key: HDFS-15293
>                 URL: https://issues.apache.org/jira/browse/HDFS-15293
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>
> HDFS-12979 introduced the logic that, if ANN sees consecutive fs image upload 
> from Standby with a small delta comparing to previous fsImage. ANN would 
> reject this image. This is to avoid overly frequent fsImage in case of when 
> there are multiple Standby node. However this check could be too stringent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to