Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

Austin Chungath Mon, 07 May 2012 03:27:36 -0700

Thanks,

So I decided to try and move using distcp.


$ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy
12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp]
12/05/07 14:57:38 INFO tools.DistCp: destPath=hdfs://localhost:8021/tmp_copy
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client =
63, server = 61)

I found that we can do distcp like above only if both are of the same
hadoop version.
so I tried:

$ hadoop distcp hftp://localhost:50070/tmp hdfs://localhost:60070/tmp_copy
12/05/07 15:02:44 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/tmp]
12/05/07 15:02:44 INFO tools.DistCp:
destPath=hdfs://localhost:60070/tmp_copy

But this process seemed to be hangs at this stage. What might I be doing
wrong?

hftp://<dfs.http.address>/<path>
hftp://localhost:50070 is dfs.http.address of 0.20.205
hdfs://localhost:60070 is dfs.http.address of cdh3u3

Thanks and regards,
Austin


On Fri, May 4, 2012 at 4:30 AM, Michel Segel <michael_se...@hotmail.com>wrote:

> Ok... So riddle me this...
> I currently have a replication factor of 3.
> I reset it to two.
>
> What do you have to do to get the replication factor of 3 down to 2?
> Do I just try to rebalance the nodes?
>
> The point is that you are looking at a very small cluster.
> You may want to start the be cluster with a replication factor of 2 and
> then when the data is moved over, increase it to a factor of 3. Or maybe
> not.
>
> I do a distcp to. Copy the data and after each distcp, I do an fsck for a
> sanity check and then remove the files I copied. As I gain more room, I can
> then slowly drop nodes, do an fsck, rebalance and then repeat.
>
> Even though this us a dev cluster, the OP wants to retain the data.
>
> There are other options depending on the amount and size of new hardware.
> I mean make one machine a RAID 5 machine, copy data to it clearing off the
> cluster.
>
> If 8TB was the amount of disk used, that would be 2.6666 TB used.
> Let's say 3TB. Going raid 5, how much disk is that?  So you could fit it
> on one machine, depending on hardware, or maybe 2 machines...  Now you can
> rebuild initial cluster and then move data back. Then rebuild those
> machines. Lots of options... ;-)
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 3, 2012, at 11:26 AM, Suresh Srinivas <sur...@hortonworks.com>
> wrote:
>
> > This probably is a more relevant question in CDH mailing lists. That
> said,
> > what Edward is suggesting seems reasonable. Reduce replication factor,
> > decommission some of the nodes and create a new cluster with those nodes
> > and do distcp.
> >
> > Could you share with us the reasons you want to migrate from Apache 205?
> >
> > Regards,
> > Suresh
> >
> > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <edlinuxg...@gmail.com
> >wrote:
> >
> >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more
> >> or a cross-grade then an upgrade or downgrade. I would just stick it
> >> out. But yes like Michael said two clusters on the same gear and
> >> distcp. If you are using RF=3 you could also lower your replication to
> >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving
> >> stuff.
> >>
> >>
> >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel <michael_se...@hotmail.com
> >
> >> wrote:
> >>> Ok... When you get your new hardware...
> >>>
> >>> Set up one server as your new NN, JT, SN.
> >>> Set up the others as a DN.
> >>> (Cloudera CDH3u3)
> >>>
> >>> On your existing cluster...
> >>> Remove your old log files, temp files on HDFS anything you don't need.
> >>> This should give you some more space.
> >>> Start copying some of the directories/files to the new cluster.
> >>> As you gain space, decommission a node, rebalance, add node to new
> >> cluster...
> >>>
> >>> It's a slow process.
> >>>
> >>> Should I remind you to make sure you up you bandwidth setting, and to
> >> clean up the hdfs directories when you repurpose the nodes?
> >>>
> >>> Does this make sense?
> >>>
> >>> Sent from a remote device. Please excuse any typos...
> >>>
> >>> Mike Segel
> >>>
> >>> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com>
> wrote:
> >>>
> >>>> Yeah I know :-)
> >>>> and this is not a production cluster ;-) and yes there is more
> hardware
> >>>> coming :-)
> >>>>
> >>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel <
> michael_se...@hotmail.com
> >>> wrote:
> >>>>
> >>>>> Well, you've kind of painted yourself in to a corner...
> >>>>> Not sure why you didn't get a response from the Cloudera lists, but
> >> it's a
> >>>>> generic question...
> >>>>>
> >>>>> 8 out of 10 TB. Are you talking effective storage or actual disks?
> >>>>> And please tell me you've already ordered more hardware.. Right?
> >>>>>
> >>>>> And please tell me this isn't your production cluster...
> >>>>>
> >>>>> (Strong hint to Strata and Cloudea... You really want to accept my
> >>>>> upcoming proposal talk... ;-)
> >>>>>
> >>>>>
> >>>>> Sent from a remote device. Please excuse any typos...
> >>>>>
> >>>>> Mike Segel
> >>>>>
> >>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Yes. This was first posted on the cloudera mailing list. There were
> no
> >>>>>> responses.
> >>>>>>
> >>>>>> But this is not related to cloudera as such.
> >>>>>>
> >>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in
> apache
> >>>>>> hadoop 0.20.205
> >>>>>>
> >>>>>> There is an upgrade namenode option when we are migrating to a
> higher
> >>>>>> version say from 0.20 to 0.20.205
> >>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3)
> >>>>>> Is this possible?
> >>>>>>
> >>>>>>
> >>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <
> >> prash1...@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so would
> not
> >>>>> know
> >>>>>>> much, but you might find some help moving this to Cloudera mailing
> >> list.
> >>>>>>>
> >>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <
> austi...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> There is only one cluster. I am not copying between clusters.
> >>>>>>>>
> >>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage
> >>>>> capacity
> >>>>>>>> and has about 8 TB of data.
> >>>>>>>> Now how can I migrate the same cluster to use cdh3 and use that
> >> same 8
> >>>>> TB
> >>>>>>>> of data.
> >>>>>>>>
> >>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of
> >> free
> >>>>>>>> space
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <
> >> nitinpawar...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> you can actually look at the distcp
> >>>>>>>>>
> >>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
> >>>>>>>>>
> >>>>>>>>> but this means that you have two different set of clusters
> >> available
> >>>>> to
> >>>>>>>> do
> >>>>>>>>> the migration
> >>>>>>>>>
> >>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <
> >> austi...@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks for the suggestions,
> >>>>>>>>>> My concerns are that I can't actually copyToLocal from the dfs
> >>>>>>> because
> >>>>>>>>> the
> >>>>>>>>>> data is huge.
> >>>>>>>>>>
> >>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can
> do
> >> a
> >>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs.
> >>>>>>>>>>
> >>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use
> CDH3
> >>>>>>> now,
> >>>>>>>>>> which is based on 0.20
> >>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has
> to
> >> be
> >>>>>>>> used
> >>>>>>>>>> by 0.20's namenode.
> >>>>>>>>>>
> >>>>>>>>>> Any idea how I can achieve what I am trying to do?
> >>>>>>>>>>
> >>>>>>>>>> Thanks.
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <
> >>>>>>> nitinpawar...@gmail.com
> >>>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> i can think of following options
> >>>>>>>>>>>
> >>>>>>>>>>> 1) write a simple get and put code which gets the data from DFS
> >> and
> >>>>>>>>> loads
> >>>>>>>>>>> it in dfs
> >>>>>>>>>>> 2) see if the distcp  between both versions are compatible
> >>>>>>>>>>> 3) this is what I had done (and my data was hardly few hundred
> >> GB)
> >>>>>>> ..
> >>>>>>>>>> did a
> >>>>>>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <
> >>>>>>> austi...@gmail.com
> >>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3.
> >>>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache
> >>>>>>> hadoop
> >>>>>>>>>>>> 0.20.205.
> >>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on
> >>>>>>>> 0.20.205.
> >>>>>>>>>>>> What is the best practice/ techniques to do this?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks & Regards,
> >>>>>>>>>>>> Austin
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Nitin Pawar
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Nitin Pawar
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>
>

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

Reply via email to