Hey Zlatin, Thanks for the explanation and the additional data. I'm a bit busy today but will try to go through the data and reproduce the results later this week.
-Todd On Wed, Jan 13, 2010 at 2:07 PM, <zlatin.balev...@barclayscapital.com>wrote: > Todd, > > I used a shell script that launched 8 instances of the bin/hadoop fs -put > utility. After all 8 processes were done and I verified though the web ui > that the files were inserted, I re-launched the script manually again. That > is why you'll notice that in the metrics there are two short periods without > any activity (I edited those out from the graph). There were occasional > NotReplicatedYet exceptions in the logs of those processes, but they were > occurring at constant rate. > > I did not run a profiler, but that will eventually be the next step. I'm > attaching the metrics from the namenode and one of the datanodes from the > experiment with 4 datanodes. They were recorded every 10 seconds. Heap > size for all processes is 2GB, and while there was occasional CPU usage on > the Namenode it was never 100%. (and there are plenty of cores). > > Ultimately the block size will be much larger than the default as the total > data will be in the 2^(well over 50) range. With this test I am trying to > determine if there are any bottlenecks at the NameNode component. > > Best Regards, > Zlatin Balevsky > > ------------------------------ > *From:* Todd Lipcon [mailto:t...@cloudera.com] > *Sent:* Wednesday, January 13, 2010 4:34 PM > *To:* hdfs-user@hadoop.apache.org > *Subject:* Re: Exponential performance decay when inserting large number > of blocks > > Also, if you have the program you used to do the insertions, and could > attach it, I'd be interested in trying to replicate this on a test cluster. > If you can't redistribute it, I can start from scratch, but would be easier > to run yours. > > Thanks > -Todd > > On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon <t...@cloudera.com> wrote: > >> Hi Zlatin, >> >> This is a very interesting test you've run, and certainly not expected >> results. I know of many clusters happily chugging along with millions of >> blocks, so problems at 400K are very strange. By any chance were you able to >> collect profiling information from the NameNode while running this test? >> >> That said, I hope you've set the block size to 1KB for the purpose of this >> test and not because you expect to run that in production. Recommended block >> sizes are at least 64MB and often 128MB or 256MB for larger clusters. >> >> Thanks >> -Todd >> >> On Wed, Jan 13, 2010 at 1:21 PM, <zlatin.balev...@barclayscapital.com>wrote: >> >>> Greetings, >>> >>> I am testing how HDFS scales with very large number of blocks. I did >>> the following setup: >>> >>> Set the default blocks size to 1KB >>> Started 8 insert processes, each inserting a 16MB file >>> Repeated the insert 3 times, keeping the already inserted files in HDFS >>> Repeated the entire experiment on one cluster with 4 and another with 11 >>> identical datanodes (allocated through HOD) >>> >>> Results: >>> The first 128MB (2^18 blocks) insert finished in 5 minutes. The second >>> in 12 minutes. The third didn't finish within 1 hour. The 11-node >>> cluster was marginally faster. >>> >>> Throughout this I was storing all available metrics. There were no >>> signs of insufficient memory on any of the nodes; CPU usage and garbage >>> collections were constant throughout. If anyone is interested I can >>> provide the recorded metrics. I've attached a chart that looks clearly >>> logarithmic. >>> >>> Can anyone please point to what could be the bottleneck here? I'm >>> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18) >>> blocks. >>> >>> Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards, >>> Zlatin Balevsky >>> >>> _______________________________________________ >>> >>> This e-mail may contain information that is confidential, privileged or >>> otherwise protected from disclosure. If you are not an intended recipient of >>> this e-mail, do not duplicate or redistribute it by any means. Please delete >>> it and any attachments and notify the sender that you have received it in >>> error. Unless specifically indicated, this e-mail is not an offer to buy or >>> sell or a solicitation to buy or sell any securities, investment products or >>> other financial product or service, an official confirmation of any >>> transaction, or an official statement of Barclays. Any views or opinions >>> presented are solely those of the author and do not necessarily represent >>> those of Barclays. This e-mail is subject to terms available at the >>> following link: www.barcap.com/emaildisclaimer. By messaging with >>> Barclays you consent to the foregoing. Barclays Capital is the investment >>> banking division of Barclays Bank PLC, a company registered in England >>> (number 1026167) with its registered office at 1 Churchill Place, London, >>> E14 5HP. This email may relate to or be sent from other members of the >>> Barclays Group. >>> _______________________________________________ >>> >> >> > _______________________________________________ > > > > This e-mail may contain information that is confidential, privileged or > otherwise protected from disclosure. If you are not an intended recipient of > this e-mail, do not duplicate or redistribute it by any means. Please delete > it and any attachments and notify the sender that you have received it in > error. Unless specifically indicated, this e-mail is not an offer to buy or > sell or a solicitation to buy or sell any securities, investment products or > other financial product or service, an official confirmation of any > transaction, or an official statement of Barclays. Any views or opinions > presented are solely those of the author and do not necessarily represent > those of Barclays. This e-mail is subject to terms available at the > following link: www.barcap.com/emaildisclaimer. By messaging with Barclays > you consent to the foregoing. Barclays Capital is the investment banking > division of Barclays Bank PLC, a company registered in England (number > 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. > This email may relate to or be sent from other members of the Barclays > Group.** > > _______________________________________________ >