[
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818594#comment-16818594
]
Hadoop QA commented on HDFS-10659:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red} The patch doesn't appear to include any new or modified
tests. Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
15m 18s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}
0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch
generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
14m 1s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}107m 5s{color}
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
33s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 12s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSClientRetries |
| | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
| | hadoop.hdfs.server.datanode.TestDataNodeUUID |
| | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-10659 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12966008/HDFS-10659.006.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux 0df007c33fe2 4.4.0-144-generic #170~14.04.1-Ubuntu SMP Mon Mar
18 15:02:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5583e1b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle |
https://builds.apache.org/job/PreCommit-HDFS-Build/26640/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/26640/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/26640/testReport/ |
| Max. process+thread count | 3105 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/26640/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Namenode crashes after Journalnode re-installation in an HA cluster due to
> missing paxos directory
> --------------------------------------------------------------------------------------------------
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: ha, journal-node
> Affects Versions: 2.7.0
> Reporter: Amit Anand
> Assignee: star
> Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch,
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch,
> HDFS-10659.005.patch, HDFS-10659.006.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling
> upgrades followed by rolling re-install of each node including master(NN, JN,
> RM, ZK) nodes. When a journal node is re-installed or moved to a new
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt
> creates NNNN.tmp files under {{"<disk>/jn/current/paxos"}} directory . In
> current implementation "paxos" directry is only created during
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos"
> directory is not created upon JN startup or by NN while writing NNNN.tmp
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
> at
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.<init>(AtomicFileOutputStream.java:58)
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
> method simply returns a path to a file under "paxos" directory without
> verifiying its existence. Since "paxos" directoy holds files that are
> required for NN recovery and acheiving JN quorum my proposed solution is to
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if
> it is missing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]