[
https://issues.apache.org/jira/browse/HDFS-11146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195296#comment-16195296
]
Daryn Sharp commented on HDFS-11146:
------------------------------------
I think it looks ok, but need to think through a few use cases. I was
originally thinking about this from a RU perspective since we already force
FBRs to accelerate clearing staleness after restarting the DN. That's safe.
The problem is a non-RU failover might not be safe. The stale check prevents
data loss when DNs have queued invalidations, failover occurs, new active NN
issues its own invalidations to different DNs. Best case, block becomes under
highly under-replicated and corrected. Worst case, NN deletes all replicas...
Kihwal thinks the DN might remove the replica from its map when queueing the
invalidation. If so, that might solve the race with the FBR that clears the
staleness lagging the pending invalidations. Another option may be to flush
the async invalidation queue when a new active is detected via heartbeat
response. At any rate, we need to ensure there's some mechanism to prevent
aggressive de-stalination (I just created and own that term) from jeopardizing
durability.
> Excess replicas will not be deleted until all storages's FBR received after
> failover
> ------------------------------------------------------------------------------------
>
> Key: HDFS-11146
> URL: https://issues.apache.org/jira/browse/HDFS-11146
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
> Attachments: HDFS-11146-002.patch, HDFS-11146-003.patch,
> HDFS-11146-004.patch, HDFS-11146-005.patch, HDFS-11146.patch
>
>
> Excess replicas will not be deleted until all storages's FBR received after
> failover.
> Thinking following soultion can help.
> *Solution:*
> I think after failover, As DNs aware of failover ,so they can send another
> block report (FBR) irrespective of interval.May be some shuffle can be done,
> similar to initial delay.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]