[
https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang reassigned HDFS-14314:
--------------------------------------
Assignee: star
> fullBlockReportLeaseId should be reset after registering to NN
> --------------------------------------------------------------
>
> Key: HDFS-14314
> URL: https://issues.apache.org/jira/browse/HDFS-14314
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.8.4
> Environment:
>
>
> Reporter: star
> Assignee: star
> Priority: Critical
> Fix For: 2.8.4
>
> Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch,
> HDFS-14314-trunk.002.patch, HDFS-14314-trunk.003.patch,
> HDFS-14314-trunk.004.patch, HDFS-14314-trunk.005.patch, HDFS-14314.0.patch,
> HDFS-14314.2.patch, HDFS-14314.patch
>
>
> since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full
> block lease id from active NN before sending full block to NN. Then DN will
> send full block report together with lease id. If the lease id is invalid, NN
> will reject the full block report and log "not in the pending set".
> In a case when DN is doing full block reporting while NN is restarted.
> It happens that DN will later send a full block report with lease id
> ,acquired from previous NN instance, which is invalid to the new NN instance.
> Though DN recognized the new NN instance by heartbeat and reregister itself,
> it did not reset the lease id from previous instance.
> The issuse may cause DNs to temporarily go dead, making it unsafe to
> restart NN especially in hadoop cluster which has large amount of DNs.
> HDFS-12914 reported the issue without any clues why it occurred and remain
> unsolved.
> To make it clear, look at code below. We take it from method
> offerService of class BPServiceActor. We eliminate some code to focus on
> current issue. fullBlockReportLeaseId is a local variable to hold lease id
> from NN. Exceptions will occur at blockReport call when NN restarting, which
> will be caught by catch block in while loop. Thus fullBlockReportLeaseId will
> not be set to 0. After NN restarted, DN will send full block report which
> will be rejected by the new NN instance. DN will never send full block report
> until the next full block report schedule, about an hour later.
> Solution is simple, just reset fullBlockReportLeaseId to 0 after any
> exception or after registering to NN. Thus it will ask for a valid
> fullBlockReportLeaseId from new NN instance.
> {code:java}
> private void offerService() throws Exception {
> long fullBlockReportLeaseId = 0;
> //
> // Now loop for a long time....
> //
> while (shouldRun()) {
> try {
> final long startTime = scheduler.monotonicNow();
> //
> // Every so often, send heartbeat or block-report
> //
> final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime);
> HeartbeatResponse resp = null;
> if (sendHeartbeat) {
>
> boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) &&
> scheduler.isBlockReportDue(startTime);
> scheduler.scheduleNextHeartbeat();
> if (!dn.areHeartbeatsDisabledForTests()) {
> resp = sendHeartBeat(requestBlockReportLease);
> assert resp != null;
> if (resp.getFullBlockReportLeaseId() != 0) {
> if (fullBlockReportLeaseId != 0) {
> LOG.warn(nnAddr + " sent back a full block report lease " +
> "ID of 0x" +
> Long.toHexString(resp.getFullBlockReportLeaseId()) +
> ", but we already have a lease ID of 0x" +
> Long.toHexString(fullBlockReportLeaseId) + ". " +
> "Overwriting old lease ID.");
> }
> fullBlockReportLeaseId = resp.getFullBlockReportLeaseId();
> }
>
> }
> }
>
>
> if ((fullBlockReportLeaseId != 0) || forceFullBr) {
> //Exception occurred here when NN restarting
> cmds = blockReport(fullBlockReportLeaseId);
> fullBlockReportLeaseId = 0;
> }
>
> } catch(RemoteException re) {
>
> } // while (shouldRun())
> } // offerService{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]