[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671870#comment-15671870 ]
ASF GitHub Bot commented on HADOOP-13655: ----------------------------------------- Github user liuml07 commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r88332309 --- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm --- @@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources The SSL configuration file must be in the class-path of the DistCp program. +$H3 DistCp and Object Stores + +DistCp works with Object Stores such as Amazon S3, Azure WASB and OpenStack Swift. + +Prequisites + +1. The JAR containing the object store implementation is on the classpath, +along with all of its dependencies. +1. Unless the JAR automatically registers its bundled filesystem clients, +the configuration may need to be modified to state the class which +implements the filesystem schema. All of the ASF's own object store clients +are self-registering. +1. The relevant object store access credentials must be available in the cluster +configuration, or be otherwise available in all cluster hosts. + +DistCp can be used to upload data + +```bash +hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1 +``` + +To download data + +```bash +hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results +``` + +To copy data between object stores + +```bash +hadoop distcp s3a://bucket/generated/results \ + wasb://upda...@example.blob.core.windows.net +``` + +And do copy data within an object store + +```bash +hadoop distcp wasb://upda...@example.blob.core.windows.net/current \ + wasb://upda...@example.blob.core.windows.net/old +``` + +And to use `-update` to only copy changed files. + +```bash +hadoop distcp -update -numListstatusThreads 20 \ + swift://history.cluster1/2016 \ + hdfs://nn1:8020/history/2016 +``` + +Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation +on a large directory tree (the limit is 40 threads). + +When `DistCp -update` is used with objec stores, --- End diff -- objec -> object > document object store use with fs shell and distcp > -------------------------------------------------- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 > Affects Versions: 2.7.3 > Reporter: Steve Loughran > Assignee: Steve Loughran > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org