Hi,

There is a distributed copy utility in hadoop which would allow to copy large chunks of data from one dfs to another. The exact syntax for using this command is

distcp [OPTIONS] <srcurl>* <desturl>

OPTIONS:
-p[rbugp]              Preserve status
                      r: replication number
                      b: block size
                      u: user
                      g: group
                      p: permission
                      -p alone is equivalent to -prbugp
-i                     Ignore failures
-log <logdir>          Write logs to <logdir>
-overwrite             Overwrite destination
-update                Overwrite if src size different from dst size
-f <urilist_uri>       Use list at <urilist_uri> as src list

NOTE: if -overwrite or -update are set, each source URI is
     interpreted as an isomorphic update to an existing directory.
For example:
hadoop distcp -p -update "hdfs://A:8020/user/foo/bar" "hdfs://B:8020/user/foo/baz"

    would update all descendants of 'baz' also in 'bar'; it would
    *not* update /user/foo/baz/bar

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenod
-jt <local|jobtracker:port>    specify a job tracker

This utility utilizes map reduce to copy large chunks of data.

hope this helps.

Pratyush



[EMAIL PROTECTED] wrote:
Hi all,
    I have a large dataset saved in a hadoop cluster, and now I want
to copy these data from this hadoop cluster into another hadoop
cluster,  who can tell me how?
    Thank you very much !
    Best wishes !

maqiang

Reply via email to