Also check the IO wait time on your datanodes, if the io wait time is high, you can't win.
On Fri, Jun 12, 2009 at 11:24 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > What's your replication factor? What aggregate I/O rates do you see in > Ganglia? Is the I/O spikey, or has it plateaued? > > We can hit close to network rate (1Gbps) per node locally, and have pretty > similar hardware. > > Brian > > > On Jun 12, 2009, at 9:03 AM, Scott wrote: > > I ran the put command on 3 of the nodes simultaneously to copy files that >> were local on those machines into the hdfs. >> >> Brian Bockelman wrote: >> >>> What'd you do for the tests? Was it a single stream or a multiple stream >>> test? >>> >>> Brian >>> >>> On Jun 12, 2009, at 6:48 AM, Scott wrote: >>> >>> So is ~ 1GB/minute transfer rate a reasonable performance benchmark? >>>> Our test cluster consists of 4 quad core xeon machines with 2 non-raided >>>> drives each. My initial tests show a transfer rate of around 1GB/minute, >>>> and that was slower that I expected it to be. >>>> >>>> Thanks, >>>> Scott >>>> >>>> >>>> Brian Bockelman wrote: >>>> >>>>> Hey Sugandha, >>>>> >>>>> Transfer rates depend on the quality/quantity of your hardware and the >>>>> quality of your client disk that is generating the data. I usually say >>>>> that >>>>> you should expect near-hardware-bottleneck speeds for an otherwise idle >>>>> cluster. >>>>> >>>>> There should be no "make it fast" required (though you should reviewi >>>>> the logs for errors if it's going slow). I would expect a 5GB file to >>>>> take >>>>> around 3-5 minutes to write on our cluster, but it's a well-tuned and >>>>> operational cluster. >>>>> >>>>> As Todd (I think) mentioned before, we can't help any when you say "I >>>>> want to make it faster". You need to provide diagnostic information - >>>>> logs, >>>>> Ganglia plots, stack traces, something - that folks can look at. >>>>> >>>>> Brian >>>>> >>>>> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote: >>>>> >>>>> But if I want to make it fast, then??? I want to place the data in >>>>>> HDFS and >>>>>> reoplicate it in fraction of seconds. Can that be possible. and How? >>>>>> >>>>>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik....@gmail.com> >>>>>> wrote: >>>>>> >>>>>> I would suppose about 2-3 hours. It took me some 2 days to load a 160 >>>>>>> Gb >>>>>>> file. >>>>>>> Secura >>>>>>> >>>>>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar >>>>>>> <sugandha....@gmail.com>wrote:It >>>>>>> >>>>>>> Hello! >>>>>>>> >>>>>>>> If I try to transfer a 5GB VDI file from a remote host(not a part of >>>>>>>> >>>>>>> hadoop >>>>>>> >>>>>>>> cluster) into HDFS, and get it back, how much time is it supposed to >>>>>>>> >>>>>>> take? >>>>>>> >>>>>>>> >>>>>>>> No map-reduce involved. Simply Writing files in and out from HDFS >>>>>>>> through >>>>>>>> >>>>>>> a >>>>>>> >>>>>>>> simple code of java (usage of API's). >>>>>>>> >>>>>>>> -- >>>>>>>> Regards! >>>>>>>> Sugandha >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards! >>>>>> Sugandha >>>>>> >>>>> >>>>> >>> > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals