Hi,
There is a distributed copy utility in hadoop which would allow to copy
large chunks of data from one dfs to another. The exact syntax for using
this command is
distcp [OPTIONS] <srcurl>* <desturl>
OPTIONS:
-p[rbugp] Preserve status
r: replication number
b: block size
u: user
g: group
p: permission
-p alone is equivalent to -prbugp
-i Ignore failures
-log <logdir> Write logs to <logdir>
-overwrite Overwrite destination
-update Overwrite if src size different from dst size
-f <urilist_uri> Use list at <urilist_uri> as src list
NOTE: if -overwrite or -update are set, each source URI is
interpreted as an isomorphic update to an existing directory.
For example:
hadoop distcp -p -update "hdfs://A:8020/user/foo/bar"
"hdfs://B:8020/user/foo/baz"
would update all descendants of 'baz' also in 'bar'; it would
*not* update /user/foo/baz/bar
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenod
-jt <local|jobtracker:port> specify a job tracker
This utility utilizes map reduce to copy large chunks of data.
hope this helps.
Pratyush
[EMAIL PROTECTED] wrote:
Hi all,
I have a large dataset saved in a hadoop cluster, and now I want
to copy these data from this hadoop cluster into another hadoop
cluster, who can tell me how?
Thank you very much !
Best wishes !
maqiang