Hi Bertrand, Gateway machine is one which is usually used to connect to the Hadoop cluster however the machine itself does not contain DataNode/Tasktracker. Warm Regards, Sumit
________________________________ From: Bertrand Dechoux <decho...@gmail.com> To: common-user@hadoop.apache.org; sumit ghosh <sumi...@yahoo.com> Sent: Tuesday, 30 October 2012 4:40 PM Subject: Re: Loading Data to HDFS I don't know what you mean by gateway but in order to have a rough idea of the time needed you need 3 values * amount of data you want to put on hadoop * hadoop bandwidth with regards to local storage (read/write) * bandwidth between where your data are stored and where the hadoop cluster is For the latter, for big volumes, physically moving the volumes is a viable solution. It will depends on your constraints of course : budget, speed... Bertrand On Tue, Oct 30, 2012 at 11:39 AM, sumit ghosh <sumi...@yahoo.com> wrote: Hi Bertrand, > >By Physically movi ng the data do you mean that the data volume is connected >to the gateway machine and the data is loaded from the local copy using >copyFromLocal? > >Thanks, >Sumit > > > >________________________________ >From: Bertrand Dechoux <decho...@gmail.com> >To: common-user@hadoop.apache.org; sumit ghosh <sumi...@yahoo.com> >Sent: Tuesday, 30 October 2012 3:46 PM >Subject: Re: Loading Data to HDFS > > >It might sound like a deprecated way but can't you move the data physically? >From what I understand, it is one shot and not "streaming" so it could be a >good method if you the access of course. > >Regards > >Bertrand > >On Tue, Oct 30, 2012 at 11:07 AM, sumit ghosh <sumi...@yahoo.com> wrote: > >> Hi, >> >> I have a data on remote machine accessible over ssh. I have Hadoop CDH4 >> installed on RHEL. I am planning to load quite a few Petabytes of Data onto >> HDFS. >> >> Which will be the fastest method to use and are there any projects around >> Hadoop which can be used as well? >> >> >> I cannot install Hadoop-Client on the remote machine. >> >> Have a great Day Ahead! >> Sumit. >> >> >> --------------- >> Here I am attaching my previous discussion on CDH-user to avoid >> duplication. >> --------------- >> On Wed, Oct 24, 2012 at 9:29 PM, Alejandro Abdelnur <t...@cloudera.com> >> wrote: >> in addition to jarcec's suggestions, you could use httpfs. then you'd only >> need to poke a single host:port in your firewall as all the traffic goes >> thru it. >> thx >> Alejandro >> >> On Oct 24, 2012, at 8:28 AM, Jarek Jarcec Cecho <jar...@cloudera.com> >> wrote: >> > Hi Sumit, >> > there is plenty of ways how to achieve that. Please find my feedback >> below: >> > >> >> Does Sqoop support loading flat files to HDFS? >> > >> > No, sqoop is supporting only data move from external database and >> warehouse systems. Copying files is not supported at the moment. >> > >> >> Can use distcp? >> > >> > No. Distcp can be used only to copy data between HDFS filesystesm. >> > >> >> How do we use the core-site.xml file on the remote machine to use >> >> copyFromLocal? >> > >> > Yes you can install hadoop binaries on your machine (with no hadoop >> running services) and use hadoop binary to upload data. Installation >> procedure is described in CDH4 installation guide [1] (follow "client" >> installation). >> > >> > Another way that I can think of is leveraging WebHDFS [2] or maybe >> hdfs-fuse [3]? >> > >> > Jarcec >> > >> > Links: >> > 1: https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation >> > 2: >> https://ccp.cloudera.com/display/CDH4DOC/Deploying+HDFS+on+a+Cluster#DeployingHDFSonaCluster-EnablingWebHDFS >> > 3: https://ccp.cloudera.com/display/CDH4DOC/Mountable+HDFS >> > >> > On Wed, Oct 24, 2012 at 01:33:29AM -0700, Sumit Ghosh wrote: >> >> >> >> >> >> Hi, >> >> >> >> I have a data on remote machine accessible over ssh. What is the fastest >> >> way to load data onto HDFS? >> >> >> >> Does Sqoop support loading flat files to HDFS? >> >> Can use distcp? >> >> How do we use the core-site.xml file on the remote machine to use >> >> copyFromLocal? >> >> >> >> Which will be the best to use and are there any other open source >> projects >> >> around Hadoop which can be used as well? >> >> Have a great Day Ahead! >> >> Sumit > > > > >-- >Bertrand Dechoux -- Bertrand Dechoux