things to check 1) when you launch distcp jobs all the datanodes of older hdfs are live and connected 2) when you launch distcp no data is being written/moved/deleteed in hdfs 3) you can use option -log to log errors into directory and user -i to ignore errors
also u can try using distcp with hdfs protocol instead of hftp ... for more you can refer https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd if it failed there should be some error On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <austi...@gmail.com> wrote: > ok that was a lame mistake. > $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy > I had spelled hdfs instead of "hftp" > > $ hadoop distcp hftp://localhost:50070/docs/index.html > hftp://localhost:60070/user/hadoop > 12/05/07 16:38:09 INFO tools.DistCp: > srcPaths=[hftp://localhost:50070/docs/index.html] > 12/05/07 16:38:09 INFO tools.DistCp: > destPath=hftp://localhost:60070/user/hadoop > With failures, global counters are inaccurate; consider running with -i > Copy failed: java.io.IOException: Not supported > at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) > at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) > at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > > Any idea why this error is coming? > I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 > (/user/hadoop) > > Thanks & Regards, > Austin > > On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <austi...@gmail.com> > wrote: > > > Thanks, > > > > So I decided to try and move using distcp. > > > > $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy > > 12/05/07 14:57:38 INFO tools.DistCp: > srcPaths=[hdfs://localhost:54310/tmp] > > 12/05/07 14:57:38 INFO tools.DistCp: > > destPath=hdfs://localhost:8021/tmp_copy > > With failures, global counters are inaccurate; consider running with -i > > Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol > > org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client > = > > 63, server = 61) > > > > I found that we can do distcp like above only if both are of the same > > hadoop version. > > so I tried: > > > > $ hadoop distcp hftp://localhost:50070/tmp > hdfs://localhost:60070/tmp_copy > > 12/05/07 15:02:44 INFO tools.DistCp: > srcPaths=[hftp://localhost:50070/tmp] > > 12/05/07 15:02:44 INFO tools.DistCp: > > destPath=hdfs://localhost:60070/tmp_copy > > > > But this process seemed to be hangs at this stage. What might I be doing > > wrong? > > > > hftp://<dfs.http.address>/<path> > > hftp://localhost:50070 is dfs.http.address of 0.20.205 > > hdfs://localhost:60070 is dfs.http.address of cdh3u3 > > > > Thanks and regards, > > Austin > > > > > > On Fri, May 4, 2012 at 4:30 AM, Michel Segel <michael_se...@hotmail.com > >wrote: > > > >> Ok... So riddle me this... > >> I currently have a replication factor of 3. > >> I reset it to two. > >> > >> What do you have to do to get the replication factor of 3 down to 2? > >> Do I just try to rebalance the nodes? > >> > >> The point is that you are looking at a very small cluster. > >> You may want to start the be cluster with a replication factor of 2 and > >> then when the data is moved over, increase it to a factor of 3. Or maybe > >> not. > >> > >> I do a distcp to. Copy the data and after each distcp, I do an fsck for > a > >> sanity check and then remove the files I copied. As I gain more room, I > can > >> then slowly drop nodes, do an fsck, rebalance and then repeat. > >> > >> Even though this us a dev cluster, the OP wants to retain the data. > >> > >> There are other options depending on the amount and size of new > hardware. > >> I mean make one machine a RAID 5 machine, copy data to it clearing off > >> the cluster. > >> > >> If 8TB was the amount of disk used, that would be 2.6666 TB used. > >> Let's say 3TB. Going raid 5, how much disk is that? So you could fit it > >> on one machine, depending on hardware, or maybe 2 machines... Now you > can > >> rebuild initial cluster and then move data back. Then rebuild those > >> machines. Lots of options... ;-) > >> > >> Sent from a remote device. Please excuse any typos... > >> > >> Mike Segel > >> > >> On May 3, 2012, at 11:26 AM, Suresh Srinivas <sur...@hortonworks.com> > >> wrote: > >> > >> > This probably is a more relevant question in CDH mailing lists. That > >> said, > >> > what Edward is suggesting seems reasonable. Reduce replication factor, > >> > decommission some of the nodes and create a new cluster with those > nodes > >> > and do distcp. > >> > > >> > Could you share with us the reasons you want to migrate from Apache > 205? > >> > > >> > Regards, > >> > Suresh > >> > > >> > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo < > edlinuxg...@gmail.com > >> >wrote: > >> > > >> >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more > >> >> or a cross-grade then an upgrade or downgrade. I would just stick it > >> >> out. But yes like Michael said two clusters on the same gear and > >> >> distcp. If you are using RF=3 you could also lower your replication > to > >> >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving > >> >> stuff. > >> >> > >> >> > >> >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel < > >> michael_se...@hotmail.com> > >> >> wrote: > >> >>> Ok... When you get your new hardware... > >> >>> > >> >>> Set up one server as your new NN, JT, SN. > >> >>> Set up the others as a DN. > >> >>> (Cloudera CDH3u3) > >> >>> > >> >>> On your existing cluster... > >> >>> Remove your old log files, temp files on HDFS anything you don't > need. > >> >>> This should give you some more space. > >> >>> Start copying some of the directories/files to the new cluster. > >> >>> As you gain space, decommission a node, rebalance, add node to new > >> >> cluster... > >> >>> > >> >>> It's a slow process. > >> >>> > >> >>> Should I remind you to make sure you up you bandwidth setting, and > to > >> >> clean up the hdfs directories when you repurpose the nodes? > >> >>> > >> >>> Does this make sense? > >> >>> > >> >>> Sent from a remote device. Please excuse any typos... > >> >>> > >> >>> Mike Segel > >> >>> > >> >>> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com> > >> wrote: > >> >>> > >> >>>> Yeah I know :-) > >> >>>> and this is not a production cluster ;-) and yes there is more > >> hardware > >> >>>> coming :-) > >> >>>> > >> >>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel < > >> michael_se...@hotmail.com > >> >>> wrote: > >> >>>> > >> >>>>> Well, you've kind of painted yourself in to a corner... > >> >>>>> Not sure why you didn't get a response from the Cloudera lists, > but > >> >> it's a > >> >>>>> generic question... > >> >>>>> > >> >>>>> 8 out of 10 TB. Are you talking effective storage or actual disks? > >> >>>>> And please tell me you've already ordered more hardware.. Right? > >> >>>>> > >> >>>>> And please tell me this isn't your production cluster... > >> >>>>> > >> >>>>> (Strong hint to Strata and Cloudea... You really want to accept my > >> >>>>> upcoming proposal talk... ;-) > >> >>>>> > >> >>>>> > >> >>>>> Sent from a remote device. Please excuse any typos... > >> >>>>> > >> >>>>> Mike Segel > >> >>>>> > >> >>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com> > >> >> wrote: > >> >>>>> > >> >>>>>> Yes. This was first posted on the cloudera mailing list. There > >> were no > >> >>>>>> responses. > >> >>>>>> > >> >>>>>> But this is not related to cloudera as such. > >> >>>>>> > >> >>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in > >> apache > >> >>>>>> hadoop 0.20.205 > >> >>>>>> > >> >>>>>> There is an upgrade namenode option when we are migrating to a > >> higher > >> >>>>>> version say from 0.20 to 0.20.205 > >> >>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) > >> >>>>>> Is this possible? > >> >>>>>> > >> >>>>>> > >> >>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi < > >> >> prash1...@gmail.com > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so > would > >> not > >> >>>>> know > >> >>>>>>> much, but you might find some help moving this to Cloudera > mailing > >> >> list. > >> >>>>>>> > >> >>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath < > >> austi...@gmail.com> > >> >>>>>>> wrote: > >> >>>>>>> > >> >>>>>>>> There is only one cluster. I am not copying between clusters. > >> >>>>>>>> > >> >>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage > >> >>>>> capacity > >> >>>>>>>> and has about 8 TB of data. > >> >>>>>>>> Now how can I migrate the same cluster to use cdh3 and use that > >> >> same 8 > >> >>>>> TB > >> >>>>>>>> of data. > >> >>>>>>>> > >> >>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB > >> of > >> >> free > >> >>>>>>>> space > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar < > >> >> nitinpawar...@gmail.com> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> you can actually look at the distcp > >> >>>>>>>>> > >> >>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > >> >>>>>>>>> > >> >>>>>>>>> but this means that you have two different set of clusters > >> >> available > >> >>>>> to > >> >>>>>>>> do > >> >>>>>>>>> the migration > >> >>>>>>>>> > >> >>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath < > >> >> austi...@gmail.com> > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>>> Thanks for the suggestions, > >> >>>>>>>>>> My concerns are that I can't actually copyToLocal from the > dfs > >> >>>>>>> because > >> >>>>>>>>> the > >> >>>>>>>>>> data is huge. > >> >>>>>>>>>> > >> >>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I > can > >> do > >> >> a > >> >>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs. > >> >>>>>>>>>> > >> >>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use > >> CDH3 > >> >>>>>>> now, > >> >>>>>>>>>> which is based on 0.20 > >> >>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info > has > >> to > >> >> be > >> >>>>>>>> used > >> >>>>>>>>>> by 0.20's namenode. > >> >>>>>>>>>> > >> >>>>>>>>>> Any idea how I can achieve what I am trying to do? > >> >>>>>>>>>> > >> >>>>>>>>>> Thanks. > >> >>>>>>>>>> > >> >>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < > >> >>>>>>> nitinpawar...@gmail.com > >> >>>>>>>>>>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>>> i can think of following options > >> >>>>>>>>>>> > >> >>>>>>>>>>> 1) write a simple get and put code which gets the data from > >> DFS > >> >> and > >> >>>>>>>>> loads > >> >>>>>>>>>>> it in dfs > >> >>>>>>>>>>> 2) see if the distcp between both versions are compatible > >> >>>>>>>>>>> 3) this is what I had done (and my data was hardly few > hundred > >> >> GB) > >> >>>>>>> .. > >> >>>>>>>>>> did a > >> >>>>>>>>>>> dfs -copyToLocal and then in the new grid did a > copyFromLocal > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < > >> >>>>>>> austi...@gmail.com > >> >>>>>>>>> > >> >>>>>>>>>>> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>>> Hi, > >> >>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. > >> >>>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache > >> >>>>>>> hadoop > >> >>>>>>>>>>>> 0.20.205. > >> >>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on > >> >>>>>>>> 0.20.205. > >> >>>>>>>>>>>> What is the best practice/ techniques to do this? > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Thanks & Regards, > >> >>>>>>>>>>>> Austin > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> -- > >> >>>>>>>>>>> Nitin Pawar > >> >>>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -- > >> >>>>>>>>> Nitin Pawar > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>> > >> >> > >> > > > > > -- Nitin Pawar