Hi, I am in the process of following your guidelines.
I would like to know: 1. How can block size impact the performance of a mapred job. 2. Does the performance improve if I setup NameNode and JobTracker on different machine. At present, I am running Namenode and JobTracker on the same machine as Master interconnected to 2 slave machines running Datanode and TaskTracker 3. What should be the replication factor for a 3 node cluster 4. How does io.sort.mb impact the performance of the cluster Thanks, Sandeep Brian Bockelman wrote: > > Hey Sandeep, > > I'd do a couple of things: > 1) Run your test. Do something which will be similar to your actual > workflow. > 2) Save the resulting Ganglia plots. This will give you a hint as to > where things are bottlenecking (memory, CPU, wait I/O). > 3) Watch iostat and find out the I/O rates during the test. Compare > this to the I/O rates of a known I/O benchmark (i.e., Bonnie+). > 4) Finally, watch the logfiles closely. If you start to overload > things, you'll usually get a pretty good indication from Hadoop where > things go wrong. Once something does go wrong, *then* look through > the parameters to see what can be done. > > There's about a hundred things which can go wrong between the kernel, > the OS, Java, and the application code. It's difficult to make an > educated guess beforehand without some hint from the data. > > Brian > > On Dec 31, 2008, at 1:30 AM, Sandeep Dhawan wrote: > >> >> Hi Brian, >> >> That's what my issue is i.e. "How do I ascertain the bottleneck" or >> in other >> words if the results obtained after doing the performance testing >> are not >> upto the mark then How do I find the bottleneck. >> >> How can we confidently say that OS and hardware are the culprits. I >> understand that by using the latest OS and hardware can improve the >> performance irrespective of the application but my real worry is >> "What Next >> ". How can I further increase the performance. What should I look >> for which >> can suggest or point the areas which can be potential problems or >> "hotspot". >> >> Thanks for your comments. >> >> ~Sandeep~ >> >> >> Brian Bockelman wrote: >>> >>> Hey Sandeep, >>> >>> I would warn against premature optimization: first, run your test, >>> then see how far from your target you are. >>> >>> Of course, I'd wager you'd find that the hardware you are using is >>> woefully underpowered and that your OS is 5 years old. >>> >>> Brian >>> >>> On Dec 30, 2008, at 5:57 AM, Sandeep Dhawan wrote: >>> >>>> >>>> Hi, >>>> >>>> I am trying to create a hadoop cluster which can handle 2000 write >>>> requests >>>> per second. >>>> In each write request I would writing a line of size 1KB in a file. >>>> >>>> I would be using machine having following configuration: >>>> Platfom: Red Hat Linux 9.0 >>>> CPU : 2.07 GHz >>>> RAM : 1GB >>>> >>>> Can anyone help in giving me some pointers/guideline as to how to go >>>> about >>>> setting up such a cluster. >>>> What are the configuration parameters in hadoop with which we can >>>> tweak to >>>> ehance the performance of the hadoop cluster. >>>> >>>> Thanks, >>>> Sandeep >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Performance-testing-tp21216266p21216266.html >>>> Sent from the Hadoop core-user mailing list archive at Nabble.com. >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Performance-testing-tp21216266p21228264.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Performance-testing-tp21216266p21548160.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
