Thanks, So I decided to try and move using distcp.
$ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp] 12/05/07 14:57:38 INFO tools.DistCp: destPath=hdfs://localhost:8021/tmp_copy With failures, global counters are inaccurate; consider running with -i Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 63, server = 61) I found that we can do distcp like above only if both are of the same hadoop version. so I tried: $ hadoop distcp hftp://localhost:50070/tmp hdfs://localhost:60070/tmp_copy 12/05/07 15:02:44 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/tmp] 12/05/07 15:02:44 INFO tools.DistCp: destPath=hdfs://localhost:60070/tmp_copy But this process seemed to be hangs at this stage. What might I be doing wrong? hftp://<dfs.http.address>/<path> hftp://localhost:50070 is dfs.http.address of 0.20.205 hdfs://localhost:60070 is dfs.http.address of cdh3u3 Thanks and regards, Austin On Fri, May 4, 2012 at 4:30 AM, Michel Segel <michael_se...@hotmail.com>wrote: > Ok... So riddle me this... > I currently have a replication factor of 3. > I reset it to two. > > What do you have to do to get the replication factor of 3 down to 2? > Do I just try to rebalance the nodes? > > The point is that you are looking at a very small cluster. > You may want to start the be cluster with a replication factor of 2 and > then when the data is moved over, increase it to a factor of 3. Or maybe > not. > > I do a distcp to. Copy the data and after each distcp, I do an fsck for a > sanity check and then remove the files I copied. As I gain more room, I can > then slowly drop nodes, do an fsck, rebalance and then repeat. > > Even though this us a dev cluster, the OP wants to retain the data. > > There are other options depending on the amount and size of new hardware. > I mean make one machine a RAID 5 machine, copy data to it clearing off the > cluster. > > If 8TB was the amount of disk used, that would be 2.6666 TB used. > Let's say 3TB. Going raid 5, how much disk is that? So you could fit it > on one machine, depending on hardware, or maybe 2 machines... Now you can > rebuild initial cluster and then move data back. Then rebuild those > machines. Lots of options... ;-) > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 3, 2012, at 11:26 AM, Suresh Srinivas <sur...@hortonworks.com> > wrote: > > > This probably is a more relevant question in CDH mailing lists. That > said, > > what Edward is suggesting seems reasonable. Reduce replication factor, > > decommission some of the nodes and create a new cluster with those nodes > > and do distcp. > > > > Could you share with us the reasons you want to migrate from Apache 205? > > > > Regards, > > Suresh > > > > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <edlinuxg...@gmail.com > >wrote: > > > >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more > >> or a cross-grade then an upgrade or downgrade. I would just stick it > >> out. But yes like Michael said two clusters on the same gear and > >> distcp. If you are using RF=3 you could also lower your replication to > >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving > >> stuff. > >> > >> > >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel <michael_se...@hotmail.com > > > >> wrote: > >>> Ok... When you get your new hardware... > >>> > >>> Set up one server as your new NN, JT, SN. > >>> Set up the others as a DN. > >>> (Cloudera CDH3u3) > >>> > >>> On your existing cluster... > >>> Remove your old log files, temp files on HDFS anything you don't need. > >>> This should give you some more space. > >>> Start copying some of the directories/files to the new cluster. > >>> As you gain space, decommission a node, rebalance, add node to new > >> cluster... > >>> > >>> It's a slow process. > >>> > >>> Should I remind you to make sure you up you bandwidth setting, and to > >> clean up the hdfs directories when you repurpose the nodes? > >>> > >>> Does this make sense? > >>> > >>> Sent from a remote device. Please excuse any typos... > >>> > >>> Mike Segel > >>> > >>> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com> > wrote: > >>> > >>>> Yeah I know :-) > >>>> and this is not a production cluster ;-) and yes there is more > hardware > >>>> coming :-) > >>>> > >>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel < > michael_se...@hotmail.com > >>> wrote: > >>>> > >>>>> Well, you've kind of painted yourself in to a corner... > >>>>> Not sure why you didn't get a response from the Cloudera lists, but > >> it's a > >>>>> generic question... > >>>>> > >>>>> 8 out of 10 TB. Are you talking effective storage or actual disks? > >>>>> And please tell me you've already ordered more hardware.. Right? > >>>>> > >>>>> And please tell me this isn't your production cluster... > >>>>> > >>>>> (Strong hint to Strata and Cloudea... You really want to accept my > >>>>> upcoming proposal talk... ;-) > >>>>> > >>>>> > >>>>> Sent from a remote device. Please excuse any typos... > >>>>> > >>>>> Mike Segel > >>>>> > >>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com> > >> wrote: > >>>>> > >>>>>> Yes. This was first posted on the cloudera mailing list. There were > no > >>>>>> responses. > >>>>>> > >>>>>> But this is not related to cloudera as such. > >>>>>> > >>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in > apache > >>>>>> hadoop 0.20.205 > >>>>>> > >>>>>> There is an upgrade namenode option when we are migrating to a > higher > >>>>>> version say from 0.20 to 0.20.205 > >>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) > >>>>>> Is this possible? > >>>>>> > >>>>>> > >>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi < > >> prash1...@gmail.com > >>>>>> wrote: > >>>>>> > >>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so would > not > >>>>> know > >>>>>>> much, but you might find some help moving this to Cloudera mailing > >> list. > >>>>>>> > >>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath < > austi...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> There is only one cluster. I am not copying between clusters. > >>>>>>>> > >>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage > >>>>> capacity > >>>>>>>> and has about 8 TB of data. > >>>>>>>> Now how can I migrate the same cluster to use cdh3 and use that > >> same 8 > >>>>> TB > >>>>>>>> of data. > >>>>>>>> > >>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of > >> free > >>>>>>>> space > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar < > >> nitinpawar...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> you can actually look at the distcp > >>>>>>>>> > >>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > >>>>>>>>> > >>>>>>>>> but this means that you have two different set of clusters > >> available > >>>>> to > >>>>>>>> do > >>>>>>>>> the migration > >>>>>>>>> > >>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath < > >> austi...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks for the suggestions, > >>>>>>>>>> My concerns are that I can't actually copyToLocal from the dfs > >>>>>>> because > >>>>>>>>> the > >>>>>>>>>> data is huge. > >>>>>>>>>> > >>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can > do > >> a > >>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs. > >>>>>>>>>> > >>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use > CDH3 > >>>>>>> now, > >>>>>>>>>> which is based on 0.20 > >>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has > to > >> be > >>>>>>>> used > >>>>>>>>>> by 0.20's namenode. > >>>>>>>>>> > >>>>>>>>>> Any idea how I can achieve what I am trying to do? > >>>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> > >>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < > >>>>>>> nitinpawar...@gmail.com > >>>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> i can think of following options > >>>>>>>>>>> > >>>>>>>>>>> 1) write a simple get and put code which gets the data from DFS > >> and > >>>>>>>>> loads > >>>>>>>>>>> it in dfs > >>>>>>>>>>> 2) see if the distcp between both versions are compatible > >>>>>>>>>>> 3) this is what I had done (and my data was hardly few hundred > >> GB) > >>>>>>> .. > >>>>>>>>>> did a > >>>>>>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal > >>>>>>>>>>> > >>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < > >>>>>>> austi...@gmail.com > >>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. > >>>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache > >>>>>>> hadoop > >>>>>>>>>>>> 0.20.205. > >>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on > >>>>>>>> 0.20.205. > >>>>>>>>>>>> What is the best practice/ techniques to do this? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks & Regards, > >>>>>>>>>>>> Austin > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Nitin Pawar > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Nitin Pawar > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >> >