Thanks Adam, That was very helpful. Your second point solved my problems :-) The hdfs port number was wrong. I didn't use the option -ppgu what does it do?
On Mon, May 7, 2012 at 8:07 PM, Adam Faris <afa...@linkedin.com> wrote: > Hi Austin, > > I don't know about using CDH3, but we use distcp for moving data between > different versions of apache grids and several things come to mind. > > 1) you should use the -i flag to ignore checksum differences on the > blocks. I'm not 100% but want to say hftp doesn't support checksums on the > blocks as they go across the wire. > > 2) you should read from hftp but write to hdfs. Also make sure to check > your port numbers. For example I can read from hftp on port 50070 and > write to hdfs on port 9000. You'll find the hftp port in hdfs-site.xml and > hdfs in core-site.xml on apache releases. > > 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support > security? If security is enabled on 0.20.205 and CDH3 does not support > security, you will need to disable security on 0.20.205. This is because > you are unable to write from a secure to unsecured grid. > > 4) use the -m flag to limit your mappers so you don't DDOS your network > backbone. > > 5) why isn't your vender helping you with the data migration? :) > > Otherwise something like this should get you going. > > hadoop -i -ppgu -log /tmp/mylog -m 20 distcp > hftp://mynamenode.grid.one:50070/path/to/my/src/data > hdfs://mynamenode.grid.two:9000/path/to/my/dst > > -- Adam > > On May 7, 2012, at 4:29 AM, Nitin Pawar wrote: > > > things to check > > > > 1) when you launch distcp jobs all the datanodes of older hdfs are live > and > > connected > > 2) when you launch distcp no data is being written/moved/deleteed in hdfs > > 3) you can use option -log to log errors into directory and user -i to > > ignore errors > > > > also u can try using distcp with hdfs protocol instead of hftp ... for > > more you can refer > > > https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd > > > > > > > > if it failed there should be some error > > On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <austi...@gmail.com> > wrote: > > > >> ok that was a lame mistake. > >> $ hadoop distcp hftp://localhost:50070/tmp > hftp://localhost:60070/tmp_copy > >> I had spelled hdfs instead of "hftp" > >> > >> $ hadoop distcp hftp://localhost:50070/docs/index.html > >> hftp://localhost:60070/user/hadoop > >> 12/05/07 16:38:09 INFO tools.DistCp: > >> srcPaths=[hftp://localhost:50070/docs/index.html] > >> 12/05/07 16:38:09 INFO tools.DistCp: > >> destPath=hftp://localhost:60070/user/hadoop > >> With failures, global counters are inaccurate; consider running with -i > >> Copy failed: java.io.IOException: Not supported > >> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) > >> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) > >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) > >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > >> > >> Any idea why this error is coming? > >> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 > >> (/user/hadoop) > >> > >> Thanks & Regards, > >> Austin > >> > >> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <austi...@gmail.com> > >> wrote: > >> > >>> Thanks, > >>> > >>> So I decided to try and move using distcp. > >>> > >>> $ hadoop distcp hdfs://localhost:54310/tmp > hdfs://localhost:8021/tmp_copy > >>> 12/05/07 14:57:38 INFO tools.DistCp: > >> srcPaths=[hdfs://localhost:54310/tmp] > >>> 12/05/07 14:57:38 INFO tools.DistCp: > >>> destPath=hdfs://localhost:8021/tmp_copy > >>> With failures, global counters are inaccurate; consider running with -i > >>> Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol > >>> org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. > (client > >> = > >>> 63, server = 61) > >>> > >>> I found that we can do distcp like above only if both are of the same > >>> hadoop version. > >>> so I tried: > >>> > >>> $ hadoop distcp hftp://localhost:50070/tmp > >> hdfs://localhost:60070/tmp_copy > >>> 12/05/07 15:02:44 INFO tools.DistCp: > >> srcPaths=[hftp://localhost:50070/tmp] > >>> 12/05/07 15:02:44 INFO tools.DistCp: > >>> destPath=hdfs://localhost:60070/tmp_copy > >>> > >>> But this process seemed to be hangs at this stage. What might I be > doing > >>> wrong? > >>> > >>> hftp://<dfs.http.address>/<path> > >>> hftp://localhost:50070 is dfs.http.address of 0.20.205 > >>> hdfs://localhost:60070 is dfs.http.address of cdh3u3 > >>> > >>> Thanks and regards, > >>> Austin > >>> > >>> > >>> On Fri, May 4, 2012 at 4:30 AM, Michel Segel < > michael_se...@hotmail.com > >>> wrote: > >>> > >>>> Ok... So riddle me this... > >>>> I currently have a replication factor of 3. > >>>> I reset it to two. > >>>> > >>>> What do you have to do to get the replication factor of 3 down to 2? > >>>> Do I just try to rebalance the nodes? > >>>> > >>>> The point is that you are looking at a very small cluster. > >>>> You may want to start the be cluster with a replication factor of 2 > and > >>>> then when the data is moved over, increase it to a factor of 3. Or > maybe > >>>> not. > >>>> > >>>> I do a distcp to. Copy the data and after each distcp, I do an fsck > for > >> a > >>>> sanity check and then remove the files I copied. As I gain more room, > I > >> can > >>>> then slowly drop nodes, do an fsck, rebalance and then repeat. > >>>> > >>>> Even though this us a dev cluster, the OP wants to retain the data. > >>>> > >>>> There are other options depending on the amount and size of new > >> hardware. > >>>> I mean make one machine a RAID 5 machine, copy data to it clearing off > >>>> the cluster. > >>>> > >>>> If 8TB was the amount of disk used, that would be 2.6666 TB used. > >>>> Let's say 3TB. Going raid 5, how much disk is that? So you could fit > it > >>>> on one machine, depending on hardware, or maybe 2 machines... Now you > >> can > >>>> rebuild initial cluster and then move data back. Then rebuild those > >>>> machines. Lots of options... ;-) > >>>> > >>>> Sent from a remote device. Please excuse any typos... > >>>> > >>>> Mike Segel > >>>> > >>>> On May 3, 2012, at 11:26 AM, Suresh Srinivas <sur...@hortonworks.com> > >>>> wrote: > >>>> > >>>>> This probably is a more relevant question in CDH mailing lists. That > >>>> said, > >>>>> what Edward is suggesting seems reasonable. Reduce replication > factor, > >>>>> decommission some of the nodes and create a new cluster with those > >> nodes > >>>>> and do distcp. > >>>>> > >>>>> Could you share with us the reasons you want to migrate from Apache > >> 205? > >>>>> > >>>>> Regards, > >>>>> Suresh > >>>>> > >>>>> On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo < > >> edlinuxg...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> Honestly that is a hassle, going from 205 to cdh3u3 is probably more > >>>>>> or a cross-grade then an upgrade or downgrade. I would just stick it > >>>>>> out. But yes like Michael said two clusters on the same gear and > >>>>>> distcp. If you are using RF=3 you could also lower your replication > >> to > >>>>>> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving > >>>>>> stuff. > >>>>>> > >>>>>> > >>>>>> On Thu, May 3, 2012 at 7:25 AM, Michel Segel < > >>>> michael_se...@hotmail.com> > >>>>>> wrote: > >>>>>>> Ok... When you get your new hardware... > >>>>>>> > >>>>>>> Set up one server as your new NN, JT, SN. > >>>>>>> Set up the others as a DN. > >>>>>>> (Cloudera CDH3u3) > >>>>>>> > >>>>>>> On your existing cluster... > >>>>>>> Remove your old log files, temp files on HDFS anything you don't > >> need. > >>>>>>> This should give you some more space. > >>>>>>> Start copying some of the directories/files to the new cluster. > >>>>>>> As you gain space, decommission a node, rebalance, add node to new > >>>>>> cluster... > >>>>>>> > >>>>>>> It's a slow process. > >>>>>>> > >>>>>>> Should I remind you to make sure you up you bandwidth setting, and > >> to > >>>>>> clean up the hdfs directories when you repurpose the nodes? > >>>>>>> > >>>>>>> Does this make sense? > >>>>>>> > >>>>>>> Sent from a remote device. Please excuse any typos... > >>>>>>> > >>>>>>> Mike Segel > >>>>>>> > >>>>>>> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com> > >>>> wrote: > >>>>>>> > >>>>>>>> Yeah I know :-) > >>>>>>>> and this is not a production cluster ;-) and yes there is more > >>>> hardware > >>>>>>>> coming :-) > >>>>>>>> > >>>>>>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel < > >>>> michael_se...@hotmail.com > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Well, you've kind of painted yourself in to a corner... > >>>>>>>>> Not sure why you didn't get a response from the Cloudera lists, > >> but > >>>>>> it's a > >>>>>>>>> generic question... > >>>>>>>>> > >>>>>>>>> 8 out of 10 TB. Are you talking effective storage or actual > disks? > >>>>>>>>> And please tell me you've already ordered more hardware.. Right? > >>>>>>>>> > >>>>>>>>> And please tell me this isn't your production cluster... > >>>>>>>>> > >>>>>>>>> (Strong hint to Strata and Cloudea... You really want to accept > my > >>>>>>>>> upcoming proposal talk... ;-) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Sent from a remote device. Please excuse any typos... > >>>>>>>>> > >>>>>>>>> Mike Segel > >>>>>>>>> > >>>>>>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Yes. This was first posted on the cloudera mailing list. There > >>>> were no > >>>>>>>>>> responses. > >>>>>>>>>> > >>>>>>>>>> But this is not related to cloudera as such. > >>>>>>>>>> > >>>>>>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in > >>>> apache > >>>>>>>>>> hadoop 0.20.205 > >>>>>>>>>> > >>>>>>>>>> There is an upgrade namenode option when we are migrating to a > >>>> higher > >>>>>>>>>> version say from 0.20 to 0.20.205 > >>>>>>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) > >>>>>>>>>> Is this possible? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi < > >>>>>> prash1...@gmail.com > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so > >> would > >>>> not > >>>>>>>>> know > >>>>>>>>>>> much, but you might find some help moving this to Cloudera > >> mailing > >>>>>> list. > >>>>>>>>>>> > >>>>>>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath < > >>>> austi...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> There is only one cluster. I am not copying between clusters. > >>>>>>>>>>>> > >>>>>>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB > storage > >>>>>>>>> capacity > >>>>>>>>>>>> and has about 8 TB of data. > >>>>>>>>>>>> Now how can I migrate the same cluster to use cdh3 and use > that > >>>>>> same 8 > >>>>>>>>> TB > >>>>>>>>>>>> of data. > >>>>>>>>>>>> > >>>>>>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 > TB > >>>> of > >>>>>> free > >>>>>>>>>>>> space > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar < > >>>>>> nitinpawar...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> you can actually look at the distcp > >>>>>>>>>>>>> > >>>>>>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > >>>>>>>>>>>>> > >>>>>>>>>>>>> but this means that you have two different set of clusters > >>>>>> available > >>>>>>>>> to > >>>>>>>>>>>> do > >>>>>>>>>>>>> the migration > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath < > >>>>>> austi...@gmail.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for the suggestions, > >>>>>>>>>>>>>> My concerns are that I can't actually copyToLocal from the > >> dfs > >>>>>>>>>>> because > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> data is huge. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I > >> can > >>>> do > >>>>>> a > >>>>>>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to > use > >>>> CDH3 > >>>>>>>>>>> now, > >>>>>>>>>>>>>> which is based on 0.20 > >>>>>>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info > >> has > >>>> to > >>>>>> be > >>>>>>>>>>>> used > >>>>>>>>>>>>>> by 0.20's namenode. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Any idea how I can achieve what I am trying to do? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < > >>>>>>>>>>> nitinpawar...@gmail.com > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> i can think of following options > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1) write a simple get and put code which gets the data from > >>>> DFS > >>>>>> and > >>>>>>>>>>>>> loads > >>>>>>>>>>>>>>> it in dfs > >>>>>>>>>>>>>>> 2) see if the distcp between both versions are compatible > >>>>>>>>>>>>>>> 3) this is what I had done (and my data was hardly few > >> hundred > >>>>>> GB) > >>>>>>>>>>> .. > >>>>>>>>>>>>>> did a > >>>>>>>>>>>>>>> dfs -copyToLocal and then in the new grid did a > >> copyFromLocal > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < > >>>>>>>>>>> austi...@gmail.com > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. > >>>>>>>>>>>>>>>> I don't want to lose the data that is in the HDFS of > Apache > >>>>>>>>>>> hadoop > >>>>>>>>>>>>>>>> 0.20.205. > >>>>>>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have > on > >>>>>>>>>>>> 0.20.205. > >>>>>>>>>>>>>>>> What is the best practice/ techniques to do this? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks & Regards, > >>>>>>>>>>>>>>>> Austin > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> Nitin Pawar > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Nitin Pawar > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>> > >>> > >>> > >> > > > > > > > > -- > > Nitin Pawar > >