Also, if you have the program you used to do the insertions, and could attach it, I'd be interested in trying to replicate this on a test cluster. If you can't redistribute it, I can start from scratch, but would be easier to run yours.
Thanks -Todd On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Zlatin, > > This is a very interesting test you've run, and certainly not expected > results. I know of many clusters happily chugging along with millions of > blocks, so problems at 400K are very strange. By any chance were you able to > collect profiling information from the NameNode while running this test? > > That said, I hope you've set the block size to 1KB for the purpose of this > test and not because you expect to run that in production. Recommended block > sizes are at least 64MB and often 128MB or 256MB for larger clusters. > > Thanks > -Todd > > On Wed, Jan 13, 2010 at 1:21 PM, <zlatin.balev...@barclayscapital.com>wrote: > >> Greetings, >> >> I am testing how HDFS scales with very large number of blocks. I did >> the following setup: >> >> Set the default blocks size to 1KB >> Started 8 insert processes, each inserting a 16MB file >> Repeated the insert 3 times, keeping the already inserted files in HDFS >> Repeated the entire experiment on one cluster with 4 and another with 11 >> identical datanodes (allocated through HOD) >> >> Results: >> The first 128MB (2^18 blocks) insert finished in 5 minutes. The second >> in 12 minutes. The third didn't finish within 1 hour. The 11-node >> cluster was marginally faster. >> >> Throughout this I was storing all available metrics. There were no >> signs of insufficient memory on any of the nodes; CPU usage and garbage >> collections were constant throughout. If anyone is interested I can >> provide the recorded metrics. I've attached a chart that looks clearly >> logarithmic. >> >> Can anyone please point to what could be the bottleneck here? I'm >> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18) >> blocks. >> >> Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards, >> Zlatin Balevsky >> >> _______________________________________________ >> >> This e-mail may contain information that is confidential, privileged or >> otherwise protected from disclosure. If you are not an intended recipient of >> this e-mail, do not duplicate or redistribute it by any means. Please delete >> it and any attachments and notify the sender that you have received it in >> error. Unless specifically indicated, this e-mail is not an offer to buy or >> sell or a solicitation to buy or sell any securities, investment products or >> other financial product or service, an official confirmation of any >> transaction, or an official statement of Barclays. Any views or opinions >> presented are solely those of the author and do not necessarily represent >> those of Barclays. This e-mail is subject to terms available at the >> following link: www.barcap.com/emaildisclaimer. By messaging with >> Barclays you consent to the foregoing. Barclays Capital is the investment >> banking division of Barclays Bank PLC, a company registered in England >> (number 1026167) with its registered office at 1 Churchill Place, London, >> E14 5HP. This email may relate to or be sent from other members of the >> Barclays Group. >> _______________________________________________ >> > >