[
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619752#comment-14619752
]
Hadoop QA commented on HBASE-13832:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12744362/HBASE-13832-v6.patch
against master branch at commit f5ad736282c8c9c27b14131919d60b72834ec9e4.
ATTACHMENT ID: 12744362
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 12 new
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 protoc{color}. The applied patch does not increase the
total number of protoc compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn post-site goal succeeds with this patch.
{color:red}-1 core tests{color}. The patch failed these unit tests:
{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at
org.apache.qpid.server.queue.ProducerFlowControlTest.testCapacityExceededCausesBlock(ProducerFlowControlTest.java:123)
at
org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:323)
at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:155)
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/14709//testReport/
Release Findbugs (version 2.0.3) warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/14709//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/14709//artifact/patchprocess/checkstyle-aggregate.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/14709//console
This message is automatically generated.
> Procedure V2: master fail to start due to WALProcedureStore sync failures
> when HDFS data nodes count is low
> -----------------------------------------------------------------------------------------------------------
>
> Key: HBASE-13832
> URL: https://issues.apache.org/jira/browse/HBASE-13832
> Project: HBase
> Issue Type: Sub-task
> Components: master, proc-v2
> Affects Versions: 2.0.0, 1.1.0, 1.2.0
> Reporter: Stephen Yuan Jiang
> Assignee: Matteo Bertozzi
> Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0
>
> Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch,
> HBASE-13832-v2.patch, HBASE-13832-v4.patch, HBASE-13832-v5.patch,
> HBASE-13832-v6.patch, HDFSPipeline.java, hbase-13832-test-hang.patch,
> hbase-13832-v3.patch
>
>
> when the data node < 3, we got failure in WALProcedureStore#syncLoop() during
> master start. The failure prevents master to get started.
> {noformat}
> 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread]
> wal.WALProcedureStore: Sync slot failed, abort.
> java.io.IOException: Failed to replace a bad datanode on the existing
> pipeline due to no more good datanodes being available to try. (Nodes:
> current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3c7777ed-93f4-47b6-9c23-1426f7a6acdc,DISK],
>
> DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
>
> original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3c7777ed-93f4-47b6-9c23-1426f7a6acdc,DISK],
> DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
> 490ece56c772,DISK]]). The current failed datanode replacement policy is
> DEFAULT, and a client may configure this via
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its
> configuration.
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
> {noformat}
> One proposal is to implement some similar logic as FSHLog: if IOException is
> thrown during syncLoop in WALProcedureStore#start(), instead of immediate
> abort, we could try to roll the log and see whether this resolve the issue;
> if the new log cannot be created or more exception from rolling the log, we
> then abort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)