[
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553250#comment-16553250
]
genericqa commented on HDFS-13672:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red} The patch doesn't appear to include any new or modified
tests. Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
12m 21s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
12m 1s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 15s{color}
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
32s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 49s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests |
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
| | hadoop.tools.TestHdfsConfigFields |
| | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
| | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
| | hadoop.hdfs.client.impl.TestBlockReaderLocal |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13672 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12932727/HDFS-13672.003.patch |
| Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
unit shadedclient findbugs checkstyle |
| uname | Linux 6d7229b49206 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/24636/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/24636/testReport/ |
| Max. process+thread count | 4007 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/24636/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
> clearCorruptLazyPersistFiles could crash NameNode
> -------------------------------------------------
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Wei-Chiu Chuang
> Assignee: Gabor Bota
> Priority: Major
> Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch,
> HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
> writeLock();
> try {
> final Iterator<BlockInfo> it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
> Block b = it.next();
> BlockInfo blockInfo = blockManager.getStoredBlock(b);
> if (blockInfo.getBlockCollection().getStoragePolicyID() ==
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
> }
> }
> for (BlockCollection bc : filesToDelete) {
> LOG.warn("Removing lazyPersist file " + bc.getName() + " with no
> replicas.");
> changed |= deleteInternal(bc.getName(), false, false, false);
> }
> } finally {
> writeUnlock();
> }
> {code}
> In essence, the iteration over corrupt replica list should be broken down
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the
> default ZKFC connection timeout, it implies an extreme case like this (100
> million corrupt blocks) could lead to NameNode failover.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]