[
https://issues.apache.org/jira/browse/HBASE-20218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404433#comment-16404433
]
Hadoop QA commented on HBASE-20218:
-----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red} The patch doesn't appear to include any new or modified
tests. Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m
22s{color} | {color:green} branch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m
18s{color} | {color:red} hbase-mapreduce: The patch generated 3 new + 15
unchanged - 0 fixed = 18 total (was 15) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m
57s{color} | {color:green} patch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}
19m 14s{color} | {color:green} Patch does not cause any errors with Hadoop
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}119m
18s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m
18s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
49s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}187m 16s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20218 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12915085/HBASE-20218.patch |
| Optional Tests | asflicense javac javadoc unit findbugs shadedjars
hadoopcheck hbaseanti checkstyle compile |
| uname | Linux b7e95713b041 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / 3f906badbe |
| maven | version: Apache Maven 3.5.3
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC3 |
| checkstyle |
https://builds.apache.org/job/PreCommit-HBASE-Build/12012/artifact/patchprocess/diff-checkstyle-hbase-mapreduce.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HBASE-Build/12012/testReport/ |
| Max. process+thread count | 4532 (vs. ulimit of 10000) |
| modules | C: hbase-server hbase-mapreduce U: . |
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/12012/console |
| Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
> Proposed Performance Enhancements For TableSnapshotInputFomat
> -------------------------------------------------------------
>
> Key: HBASE-20218
> URL: https://issues.apache.org/jira/browse/HBASE-20218
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running in AWS EMR 5.12.0 with the HBase
> rootdir set to a folder in S3
>
> Reporter: Saad Mufti
> Priority: Minor
> Attachments: HBASE-20218.patch
>
>
> I have been testing a few Spark jobs we have at my company which work off of
> TableSnapshotInputFormat to read directly from the filesystem snapshots
> created on another EMR/Hbase cluster and stored in S3. During performance
> testing I found various small changes which would greatly enhance peformance.
> Right now we are running our jobs linked with a patched version of HBase
> 1.4.0 in which I made these changes, and I am hoping to submit my patch for
> review and eventual acceptance into the main codebase.
>
> The list of changes are :
>
> 1. a flag to control whether the snapshot restore uses a UUID based random
> temp dir in the specified restore directory. We use the flag to turn this off
> so that we can benefit from a AWS S3 specific bucket partitioning scheme we
> have provisioned. The way S3 partitioning works, you have to give a fixed
> path prefix and a pattern of files after that, and AWS can then partition on
> the paths after the fixed prefix into different resources to get more
> parallelization. We were advised by AWS that we could only get this good
> partitioning behavior if we didn't have that rancom directory in there.
>
> 2. a flag to turn off the code that tries to compute locality information
> for the splits. This is useless when dealing with S3 since the files are not
> on the cluster so there is no use in computing locality; and worse yet, it
> uses a single thread in the driver to iterate over all the files in the
> restored snapshot. For a very large table this was taking hours and hours
> iterating through S3 objects just to list them (about 360,000 of them for the
> our specific table).
>
> 3. a flag to override the column family schema setting to prefetch regions on
> open. This was causing the main executor thread on which a Spark task was
> running, which was trying to read through HFile's for its scan, compete for a
> lock on the underlying EMRFS stream object with prefetch threads trying to
> read the same file, so most tasks in the Spark stage would finish but the
> last few would linger half an hour or more competing with the prefetch
> threads alternately for a lock on an EMRFS stream object. This is the only
> change that had to be outside the mapreduce package as it directly affects
> the prefetch behavior in CacheConfig.java
>
> 4. a flag to turn off maintenance of Scan metrics. this was also causing a
> major slowdown, getting rid of this sped things up 4-5 times. What I observed
> in the thread dumps was that every call to update scan metrics was trying to
> get some HBase counter object and deep underneath was trying to access some
> Java resource bundle, and throwing an exception that it wasn't found. The
> exception was never visible at the application level and was swallowed
> underneath but whatever it was doing was causing a major slowdown. So we use
> this flag to avoid collecting those metrics because we never used them
>
> I am polishing my patch a bit more and hopefully will attach it tomorrow. One
> caveat, I tried but struggled with how to write any useful unit/component
> tests for these as these are invisible behaviors that do not affect the final
> result at all. And I am not that familiar with the HBase testing standards,
> so for now I am looking for guidance on what to tests.
>
> Would appreciate any feedback plus guidance on writing tests, provided of
> course there is interest in incorporating my patch into the main codebase.
>
> Cheers.
>
> ----Saad
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)