[
https://issues.apache.org/jira/browse/HBASE-28686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867414#comment-17867414
]
Hudson commented on HBASE-28686:
--------------------------------
Results for branch master
[build #1125 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1125/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1125/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1125/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1125//console].
> MapReduceBackupCopyJob should support custom DistCp options
> -----------------------------------------------------------
>
> Key: HBASE-28686
> URL: https://issues.apache.org/jira/browse/HBASE-28686
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.6.0
> Reporter: Ray Mattingly
> Assignee: Ray Mattingly
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 3.0.0-beta-2
>
>
> h4. Problem
> The MapReduceBackupCopyJob class provides no means for updating DistCp job
> options. This means that you're stuck with defaults, which isn't always
> desirable. For example, my workplace would like the freedom to deviate from
> at least two DistCp defaults:
> # distcp.direct.write — we would like to set this to true, because writing
> and renaming tmp files is expensive in S3 (where we store our backups).
> # we would also like control over the number of mappers that DistCp will run
> h4. Proposed Solution
> It is not the prettiest solution, but I'm proposing that we support DistCp
> customizations via the given backup client configuration like
> [this.|https://github.com/HubSpot/hbase/compare/hubspot-2.6...HubSpot:hbase:backup-distcp-options]
> It's necessary to do this conf -> arg conversion because we still want to
> use [DistCp's run
> method|https://github.com/HubSpot/hadoop/blob/c4c25b0ea2be1c8bca31d86962597060b2630f62/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L134-L171],
> which expects args, so as to not change any error codes. Hadoop actually
> does something similar, but in the opposite direction — the DistCp job has
> logic to convert the args back to configurations (lol).
> Further, the DistCp API is really unfortunately designed for programmatic
> use, so it doesn't leave us great alternatives. For example, it doesn't
> matter what you pass in as DistCpOptions to the constructor if you use the
> run method, your options will be overwritten based on the args that you pass
> in. Alternatively, if you pass in the DistCpOptions in the constructor and
> use DistCp#execute or DistCp#createAndSubmitJob, then you get none of the
> error specificity!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)