Sorry responded from my phone which was bounced... I used distcp using dftp running on the cdh3 cloud to pull data from earlier cloud. (default options.)
HTH > From: [email protected] > To: [email protected] > Date: Tue, 8 Feb 2011 15:10:43 -0500 > Subject: RE: copying between hadoop instances > > Michael, > > Which worked for you? DistCp on the destination cluster or fs -cp? What > options/protocols did you use? > > Thanks, > Mike > ________________________________________ > From: Michael Segel [[email protected]] > Sent: Tuesday, February 08, 2011 2:57 PM > To: [email protected] > Subject: RE: copying between hadoop instances > > hadoop fsck / > > And yes, you should run it on the destination cluster. > > I've done this and it works for me.... > > > > From: [email protected] > > To: [email protected] > > Date: Tue, 8 Feb 2011 14:01:52 -0500 > > Subject: RE: copying between hadoop instances > > > > Same results. I think I'll have more luck with fs -cp. I think the error is > > caused by the fact that my source DFS has 29 under-replicated blocks. How > > can I get rid of these? > > > > Thanks, > > Mike > > ________________________________________ > > From: Vladimir Klimontovich [[email protected]] > > Sent: Tuesday, February 08, 2011 1:49 PM > > To: [email protected] > > Subject: Re: copying between hadoop instances > > > > Maybe, API is backward compatible. Try to run the same command on > > different node (it you ran in on mc00001, try mc00000) > > > > On Tue, Feb 8, 2011 at 8:50 PM, Korb, Michael [USA] > > <[email protected]> wrote: > > > I was unable to get the stacktrace. Is there a workaround for the > > > incompatible APIs? I'm using hftp instead of hdfs because the DistCp > > > guide (http://hadoop.apache.org/common/docs/r0.20.2/distcp.html) says, > > > "For copying between two different versions of Hadoop, one will usually > > > use HftpFileSystem. This is a read-only FileSystem, so DistCp must be run > > > on the destination cluster (more specifically, on TaskTrackers that can > > > write to the destination cluster). Each source is specified as > > > hftp://<dfs.http.address>/<path> (the default dfs.http.address is > > > <namenode>:50070)." > > > > > > Mike > > > ________________________________________ > > > From: Vladimir Klimontovich [[email protected]] > > > Sent: Tuesday, February 08, 2011 12:48 PM > > > To: [email protected] > > > Subject: Re: copying between hadoop instances > > > > > > Yes, new APIs between old and new version are incompatible. > > > > > > Did you managed to get stacktrace from > > > http://mc00000.mcloud.bah.com:50075/streamFile?filename=/user/cluster/annotated/2009/07/05/_logs/history/mc00002_1291306280950_job_201012021111_0518_cluster_com.bah.mapred.CombineFilesDriver%253A+netflow-smallfi&ugi=hdfs > > > > > > And, by the way, why are you using htfp for source instead of hdfs://? > > > > > > > > > > > > On Tue, Feb 8, 2011 at 8:45 PM, Korb, Michael [USA] > > > <[email protected]> wrote: > > >> That address is to a file on the destination fs, but it didn't get > > >> copied from the source. That is where fs -cp fails every time. Here's > > >> what happens when I try distcp: > > >> > > >> sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ > > >> hdfs://mc00000:55310/ > > >> > > >> 11/02/08 12:38:50 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/] > > >> 11/02/08 12:38:50 INFO tools.DistCp: destPath=hdfs://mc00000:55310/ > > >> Exception in thread "main" java.lang.NoSuchMethodError: > > >> org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials; > > >> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632) > > >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) > > >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > > >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > > >> > > >> I asked about this before and got a response from Ted Dunning. He said, > > >> "This is due to the security API not being available. You are crossing > > >> from a cluster with security to one without and that is causing > > >> confusion. Presumably your client assumes that it is available and your > > >> hadoop library doesn't provide it. Check your class path very carefully > > >> looking for version assumptions and confusions." > > >> > > >> I don't know where to begin checking my class path for these things... > > >> but perhaps if I could get distcp working it wouldn't run into the same > > >> problems as fs -cp. > > >> > > >> Thanks, > > >> Mike > > >> ________________________________________ > > >> From: Vladimir Klimontovich [[email protected]] > > >> Sent: Tuesday, February 08, 2011 12:24 PM > > >> To: [email protected] > > >> Subject: Re: copying between hadoop instances > > >> > > >> Try to go to > > >> http://mc00000.mcloud.bah.com:50075/streamFile?filename=/user/cluster/annotated/2009/07/05/_logs/history/mc00002_1291306280950_job_201012021111_0518_cluster_com.bah.mapred.CombineFilesDriver%253A+netflow-smallfi&ugi=hdfs > > >> and check if browser show you the stacktrace. If could give a lot of > > >> information. > > >> > > >> And what's wrong with distcp (any stacktraces, error messages?) > > >> > > >> On Tue, Feb 8, 2011 at 8:06 PM, Korb, Michael [USA] > > >> <[email protected]> wrote: > > >>> I have two Hadoop instances running on one cluster of machines for the > > >>> purpose of upgrading. I'm trying to copy all the files from the old > > >>> instance to the new one but have been having trouble with both distcp > > >>> and fs -cp. > > >>> > > >>> Most recently, I've been trying, "sudo -u hdfs ./hadoop fs -cp > > >>> hftp://mc00001:50070/* hdfs://mc00000:55310/" where mc00001 is the > > >>> namenode of old hadoop and mc00000 is the namenode of new hadoop. > > >>> > > >>> I've had some success with this command (some files have actually been > > >>> copied), but part of the way through the copy, I get this error: > > >>> > > >>> cp: Server returned HTTP response code: 500 for URL: > > >>> http://mc00000.mcloud.bah.com:50075/streamFile?filename=/user/cluster/annotated/2009/07/05/_logs/history/mc00002_1291306280950_job_201012021111_0518_cluster_com.bah.mapred.CombineFilesDriver%253A+netflow-smallfi&ugi=hdfs > > >>> > > >>> Is it possible that there could be permissions issues? It also doesn't > > >>> seem quite right to be copying * since there are directories, but I > > >>> don't think there's a way to call fs -cp recursively. Could this be > > >>> causing problems? > > >>> > > >>> Thanks, > > >>> Mike > > >> > > >> > > >> > > >> -- > > >> Vladimir Klimontovich > > >> Cell: +7-926-890-2349, skype: klimontovich > > >> > > > > > > > > > > > > -- > > > Vladimir Klimontovich > > > Cell: +7-926-890-2349, skype: klimontovich > > > > > > > > > > > -- > > Vladimir Klimontovich > > Cell: +7-926-890-2349, skype: klimontovich
