[
https://issues.apache.org/jira/browse/HBASE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827428#comment-16827428
]
HBase QA commented on HBASE-22301:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 2 new or modified test
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
58s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
5s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m
48s{color} | {color:green} branch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
50s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m
1s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
59s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
5s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
11s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
13s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
23s{color} | {color:green} hbase-server: The patch generated 0 new + 94
unchanged - 6 fixed = 94 total (was 100) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m
54s{color} | {color:green} patch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}
1m 43s{color} | {color:green} Patch does not cause any errors with Hadoop
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
48s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m
1s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m
22s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m
28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 48s{color}
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
54s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 9s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.security.access.TestAdminOnlyOperations |
| | hadoop.hbase.TestZooKeeper |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base:
https://builds.apache.org/job/PreCommit-HBASE-Build/204/artifact/patchprocess/Dockerfile
|
| JIRA Issue | HBASE-22301 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12967197/HBASE-22301-branch-1.patch
|
| Optional Tests | dupname asflicense javac javadoc unit findbugs
shadedjars hadoopcheck hbaseanti checkstyle compile |
| uname | Linux cdedc1cd4657 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | branch-1 / 5ea7851 |
| maven | version: Apache Maven 3.0.5 |
| Default Java | 1.7.0_222 |
| Multi-JDK versions | /usr/lib/jvm/java-8-openjdk-amd64:1.8.0_212
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_222 |
| unit |
https://builds.apache.org/job/PreCommit-HBASE-Build/204/artifact/patchprocess/patch-unit-hbase-server.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HBASE-Build/204/testReport/ |
| Max. process+thread count | 3727 (vs. ulimit of 10000) |
| modules | C: hbase-hadoop-compat hbase-hadoop2-compat hbase-server U: . |
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/204/console |
| Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
This message was automatically generated.
> Consider rolling the WAL if the HDFS write pipeline is slow
> -----------------------------------------------------------
>
> Key: HBASE-22301
> URL: https://issues.apache.org/jira/browse/HBASE-22301
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0
>
> Attachments: HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch,
> HBASE-22301-branch-1.patch, HBASE-22301-branch-1.patch,
> HBASE-22301-branch-1.patch
>
>
> Consider the case when a subset of the HDFS fleet is unhealthy but suffering
> a gray failure not an outright outage. HDFS operations, notably syncs, are
> abnormally slow on pipelines which include this subset of hosts. If the
> regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be
> consumed waiting for acks from the datanodes in the pipeline (recall that
> some of them are sick). Imagine a write heavy application distributing load
> uniformly over the cluster at a fairly high rate. With the WAL subsystem
> slowed by HDFS level issues, all handlers can be blocked waiting to append to
> the WAL. Once all handlers are blocked, the application will experience
> backpressure. All (HBase) clients eventually have too many outstanding writes
> and block.
> Because the application is distributing writes near uniformly in the
> keyspace, the probability any given service endpoint will dispatch a request
> to an impacted regionserver, even a single regionserver, approaches 1.0. So
> the probability that all service endpoints will be affected approaches 1.0.
> In order to break the logjam, we need to remove the slow datanodes. Although
> there is HDFS level monitoring, mechanisms, and procedures for this, we
> should also attempt to take mitigating action at the HBase layer as soon as
> we find ourselves in trouble. It would be enough to remove the affected
> datanodes from the writer pipelines. A super simple strategy that can be
> effective is described below:
> This is with branch-1 code. I think branch-2's async WAL can mitigate but
> still can be susceptible. branch-2 sync WAL is susceptible.
> We already roll the WAL writer if the pipeline suffers the failure of a
> datanode and the replication factor on the pipeline is too low. We should
> also consider how much time it took for the write pipeline to complete a sync
> the last time we measured it, or the max over the interval from now to the
> last time we checked. If the sync time exceeds a configured threshold, roll
> the log writer then too. Fortunately we don't need to know which datanode is
> making the WAL write pipeline slow, only that syncs on the pipeline are too
> slow and exceeding a threshold. This is enough information to know when to
> roll it. Once we roll it, we will get three new randomly selected datanodes.
> On most clusters the probability the new pipeline includes the slow datanode
> will be low. (And if for some reason it does end up with a problematic
> datanode again, we roll again.)
> This is not a silver bullet but this can be a reasonably effective mitigation.
> Provide a metric for tracking when log roll is requested (and for what
> reason).
> Emit a log line at log roll time that includes datanode pipeline details for
> further debugging and analysis, similar to the existing slow FSHLog sync log
> line.
> If we roll too many times within a short interval of time this probably means
> there is a widespread problem with the fleet and so our mitigation is not
> helping and may be exacerbating those problems or operator difficulties.
> Ensure log roll requests triggered by this new feature happen infrequently
> enough to not cause difficulties under either normal or abnormal conditions.
> A very simple strategy that could work well under both normal and abnormal
> conditions is to define a fairly lengthy interval, default 5 minutes, and
> then insure we do not roll more than once during this interval for this
> reason.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)