[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671870#comment-15671870
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-----------------------------------------

Github user liuml07 commented on a diff in the pull request:

    https://github.com/apache/hadoop/pull/131#discussion_r88332309
  
    --- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---
    @@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources
     
       The SSL configuration file must be in the class-path of the DistCp 
program.
     
    +$H3 DistCp and Object Stores
    +
    +DistCp works with Object Stores such as Amazon S3, Azure WASB and 
OpenStack Swift.
    +
    +Prequisites
    +
    +1. The JAR containing the object store implementation is on the classpath,
    +along with all of its dependencies.
    +1. Unless the JAR automatically registers its bundled filesystem clients,
    +the configuration may need to be modified to state the class which
    +implements the filesystem schema. All of the ASF's own object store clients
    +are self-registering.
    +1. The relevant object store access credentials must be available in the 
cluster
    +configuration, or be otherwise available in all cluster hosts.
    +
    +DistCp can be used to upload data
    +
    +```bash
    +hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
    +```
    +
    +To download data
    +
    +```bash
    +hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
    +```
    +
    +To copy data between object stores
    +
    +```bash
    +hadoop distcp s3a://bucket/generated/results \
    +  wasb://upda...@example.blob.core.windows.net
    +```
    +
    +And do copy data within an object store
    +
    +```bash
    +hadoop distcp wasb://upda...@example.blob.core.windows.net/current \
    +  wasb://upda...@example.blob.core.windows.net/old
    +```
    +
    +And to use `-update` to only copy changed files.
    +
    +```bash
    +hadoop distcp -update -numListstatusThreads 20  \
    +  swift://history.cluster1/2016 \
    +  hdfs://nn1:8020/history/2016
    +```
    +
    +Because object stores are slow to list files, consider setting the 
`-numListstatusThreads` option when performing a `-update` operation
    +on a large directory tree (the limit is 40 threads).
    +
    +When `DistCp -update` is used with objec stores,
    --- End diff --
    
    objec -> object


> document object store use with fs shell and distcp
> --------------------------------------------------
>
>                 Key: HADOOP-13655
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13655
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs, fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to