[
https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782320#comment-16782320
]
Hadoop QA commented on HDFS-14314:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
15m 30s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
55s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated
0 new + 50 unchanged - 2 fixed = 50 total (was 52) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
14m 48s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}107m 16s{color}
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
32s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}176m 20s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
| | hadoop.hdfs.TestDistributedFileSystem |
| | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14314 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12960859/HDFS-14314-trunk.006.patch
|
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux 45d325f769c3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bc6fe7a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/26385/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/26385/testReport/ |
| Max. process+thread count | 3102 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/26385/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> fullBlockReportLeaseId should be reset after registering to NN
> --------------------------------------------------------------
>
> Key: HDFS-14314
> URL: https://issues.apache.org/jira/browse/HDFS-14314
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.8.4
> Environment:
>
>
> Reporter: star
> Assignee: star
> Priority: Critical
> Fix For: 2.8.4
>
> Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch,
> HDFS-14314-trunk.002.patch, HDFS-14314-trunk.003.patch,
> HDFS-14314-trunk.004.patch, HDFS-14314-trunk.005.patch,
> HDFS-14314-trunk.006.patch, HDFS-14314.0.patch, HDFS-14314.2.patch,
> HDFS-14314.patch
>
>
> since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full
> block lease id from active NN before sending full block to NN. Then DN will
> send full block report together with lease id. If the lease id is invalid, NN
> will reject the full block report and log "not in the pending set".
> In a case when DN is doing full block reporting while NN is restarted.
> It happens that DN will later send a full block report with lease id
> ,acquired from previous NN instance, which is invalid to the new NN instance.
> Though DN recognized the new NN instance by heartbeat and reregister itself,
> it did not reset the lease id from previous instance.
> The issuse may cause DNs to temporarily go dead, making it unsafe to
> restart NN especially in hadoop cluster which has large amount of DNs.
> HDFS-12914 reported the issue without any clues why it occurred and remain
> unsolved.
> To make it clear, look at code below. We take it from method
> offerService of class BPServiceActor. We eliminate some code to focus on
> current issue. fullBlockReportLeaseId is a local variable to hold lease id
> from NN. Exceptions will occur at blockReport call when NN restarting, which
> will be caught by catch block in while loop. Thus fullBlockReportLeaseId will
> not be set to 0. After NN restarted, DN will send full block report which
> will be rejected by the new NN instance. DN will never send full block report
> until the next full block report schedule, about an hour later.
> Solution is simple, just reset fullBlockReportLeaseId to 0 after any
> exception or after registering to NN. Thus it will ask for a valid
> fullBlockReportLeaseId from new NN instance.
> {code:java}
> private void offerService() throws Exception {
> long fullBlockReportLeaseId = 0;
> //
> // Now loop for a long time....
> //
> while (shouldRun()) {
> try {
> final long startTime = scheduler.monotonicNow();
> //
> // Every so often, send heartbeat or block-report
> //
> final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime);
> HeartbeatResponse resp = null;
> if (sendHeartbeat) {
>
> boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) &&
> scheduler.isBlockReportDue(startTime);
> scheduler.scheduleNextHeartbeat();
> if (!dn.areHeartbeatsDisabledForTests()) {
> resp = sendHeartBeat(requestBlockReportLease);
> assert resp != null;
> if (resp.getFullBlockReportLeaseId() != 0) {
> if (fullBlockReportLeaseId != 0) {
> LOG.warn(nnAddr + " sent back a full block report lease " +
> "ID of 0x" +
> Long.toHexString(resp.getFullBlockReportLeaseId()) +
> ", but we already have a lease ID of 0x" +
> Long.toHexString(fullBlockReportLeaseId) + ". " +
> "Overwriting old lease ID.");
> }
> fullBlockReportLeaseId = resp.getFullBlockReportLeaseId();
> }
>
> }
> }
>
>
> if ((fullBlockReportLeaseId != 0) || forceFullBr) {
> //Exception occurred here when NN restarting
> cmds = blockReport(fullBlockReportLeaseId);
> fullBlockReportLeaseId = 0;
> }
>
> } catch(RemoteException re) {
>
> } // while (shouldRun())
> } // offerService{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]