[
https://issues.apache.org/jira/browse/HBASE-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17995815#comment-17995815
]
Hudson commented on HBASE-29432:
--------------------------------
Results for branch branch-3
[build #436 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/436/]:
(/) *{color:green}+1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/436/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/436/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop 3.3.5 backward compatibility checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/436/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop 3.3.6 backward compatibility checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/436/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop 3.4.0 backward compatibility checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/436/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test for 3.3.5 {color}
(/) {color:green}+1 client integration test for 3.3.6 {color}
(/) {color:green}+1 client integration test for 3.4.0 {color}
(/) {color:green}+1 client integration test for 3.4.1 {color}
> ExportSnapshot should support rack-awareness
> --------------------------------------------
>
> Key: HBASE-29432
> URL: https://issues.apache.org/jira/browse/HBASE-29432
> Project: HBase
> Issue Type: Improvement
> Reporter: Charles Connell
> Assignee: Charles Connell
> Priority: Minor
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3
>
>
> At my company we are using ExportSnapshot to copy HBase table snapshots to
> S3, as a backup strategy. ExportSnapshot launches a MapReduce job to perform
> the copy. This means that data flows from the HBase cluster's DataNodes, to a
> YARN cluster's nodes, and then to S3.
> We are running HBase and YARN in AWS. AWS charges a fee for
> cross-availability-zone network traffic, but not for same-availability-zone
> traffic. If we could make the DataNode -> YARN node traffic not cross
> availability zones, backups would be considerably cheaper.
> I propose to make ExposeSnapshot accept two plugins: a CustomFileGrouper and
> a FileLocationResolver. Here's what they will look like:
> {code}
> /**
> * If desired, you may implement a CustomFileGrouper in order to influence
> how ExportSnapshot
> * chooses which input files go into the MapReduce job's {@link
> InputSplit}s. Your implementation
> * must return a data structure that contains each input file exactly once.
> Files that appear in
> * separate entries in the top-level returned Collection are guaranteed to
> not be placed in the
> * same InputSplit.
> * This can be used to segregate your input files by the rack or host on
> which they are available,
> * which, used in conjunction with {@link FileLocationResolver}, can
> improve the performance
> * of your ExportSnapshot runs.
> * To use this, pass the --custom-file-grouper argument with the fully
> qualified class name of
> * an implementation of CustomFileGrouper that's on the classpath.
> * If this argument is not used, no particular grouping logic will be
> applied.
> */
> public interface CustomFileGrouper {
> Collection<Collection<Pair<SnapshotFileInfo, Long>>>
> getGroupedInputFiles(final Collection<Pair<SnapshotFileInfo, Long>>
> snapshotFiles);
> }
> /**
> * If desired, you may implement a FileLocationResolver in order to
> influence the _location_
> * metadata attached to each {@link InputSplit} that ExportSnapshot will
> submit to YARN. The
> * method {@link #getLocationsForInputFiles(Collection)} method is called
> once for each InputSplit
> * being constructed. Whatever is returned will ultimately be reported by
> that split's
> * {@link InputSplit#getLocations()} method. This can be used to encourage
> YARN to schedule
> * the ExportSnapshot's mappers on rack-local or host-local NodeManagers.
> * To use this, pass the --file-location-resolver argument with the fully
> qualified class name of
> * an implementation of FileLocationResolver that's on the classpath.
> * If this argument is not used, no locations will be attached to the
> InputSplits.
> */
> public interface FileLocationResolver {
> Set<String> getLocationsForInputFiles(final
> Collection<Pair<SnapshotFileInfo, Long>> files);
> }
> {code}
> Users can optionally provide implementations of these interfaces on their
> classpath, and tell ExportSnapshot to use them via new options. By default,
> there will be no change in behavior. If users choose to implement these
> plugins, they can influence ExportSnapshot to be topology-aware in a very
> flexible way. I plan to write my own plugins optimized for AWS pricing, but
> that won't be the only way this can be used.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)