[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518158#comment-16518158
 ] 

genericqa commented on HDFS-13672:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.TestDFSClientRetries |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928461/HDFS-13672.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1a5051bc4e08 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d87592 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24480/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24480/testReport/ |
| Max. process+thread count | 2874 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24480/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> clearCorruptLazyPersistFiles could crash NameNode
> -------------------------------------------------
>
>                 Key: HDFS-13672
>                 URL: https://issues.apache.org/jira/browse/HDFS-13672
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gabor Bota
>            Priority: Major
>         Attachments: HDFS-13672.001.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
>         Number of suppressed write-lock reports: 0
>         Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>       writeLock();
>       try {
>         final Iterator<BlockInfo> it =
>             blockManager.getCorruptReplicaBlockIterator();
>         while (it.hasNext()) {
>           Block b = it.next();
>           BlockInfo blockInfo = blockManager.getStoredBlock(b);
>           if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
>             filesToDelete.add(blockInfo.getBlockCollection());
>           }
>         }
>         for (BlockCollection bc : filesToDelete) {
>           LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>           changed |= deleteInternal(bc.getName(), false, false, false);
>         }
>       } finally {
>         writeUnlock();
>       }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to