Hi,

I am in the process of following your guidelines. 

I would like to know:

1. How can block size impact the performance of a mapred job.
2. Does the performance improve if I setup NameNode and JobTracker on
different machine. At present,
I am running Namenode and JobTracker on the same machine as Master
interconnected to 2 slave machines running Datanode and TaskTracker
3. What should be the replication factor for a 3 node cluster
4. How does io.sort.mb impact the performance of the cluster

Thanks,
Sandeep 


Brian Bockelman wrote:
> 
> Hey Sandeep,
> 
> I'd do a couple of things:
> 1) Run your test.  Do something which will be similar to your actual  
> workflow.
> 2) Save the resulting Ganglia plots.  This will give you a hint as to  
> where things are bottlenecking (memory, CPU, wait I/O).
> 3) Watch iostat and find out the I/O rates during the test.  Compare  
> this to the I/O rates of a known I/O benchmark (i.e., Bonnie+).
> 4) Finally, watch the logfiles closely.  If you start to overload  
> things, you'll usually get a pretty good indication from Hadoop where  
> things go wrong.  Once something does go wrong, *then* look through  
> the parameters to see what can be done.
> 
> There's about a hundred things which can go wrong between the kernel,  
> the OS, Java, and the application code.  It's difficult to make an  
> educated guess beforehand without some hint from the data.
> 
> Brian
> 
> On Dec 31, 2008, at 1:30 AM, Sandeep Dhawan wrote:
> 
>>
>> Hi Brian,
>>
>> That's what my issue is i.e. "How do I ascertain the bottleneck" or  
>> in other
>> words if the results obtained after doing the performance testing  
>> are not
>> upto the mark then How do I find the bottleneck.
>>
>> How can we confidently say that OS and hardware are the culprits. I
>> understand that by using the latest OS and hardware can improve the
>> performance irrespective of the application but my real worry is  
>> "What Next
>> ". How can I further increase the performance. What should I look  
>> for which
>> can suggest or point the areas which can be potential problems or  
>> "hotspot".
>>
>> Thanks for your comments.
>>
>> ~Sandeep~
>>
>>
>> Brian Bockelman wrote:
>>>
>>> Hey Sandeep,
>>>
>>> I would warn against premature optimization: first, run your test,
>>> then see how far from your target you are.
>>>
>>> Of course, I'd wager you'd find that the hardware you are using is
>>> woefully underpowered and that your OS is 5 years old.
>>>
>>> Brian
>>>
>>> On Dec 30, 2008, at 5:57 AM, Sandeep Dhawan wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I am trying to create a hadoop cluster which can handle 2000 write
>>>> requests
>>>> per second.
>>>> In each write request I would writing a line of size 1KB in a file.
>>>>
>>>> I would be using machine having following configuration:
>>>> Platfom: Red Hat Linux 9.0
>>>> CPU : 2.07 GHz
>>>> RAM : 1GB
>>>>
>>>> Can anyone help in giving me some pointers/guideline as to how to go
>>>> about
>>>> setting up such a cluster.
>>>> What are the configuration parameters in hadoop with which we can
>>>> tweak to
>>>> ehance the performance of the hadoop cluster.
>>>>
>>>> Thanks,
>>>> Sandeep
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Performance-testing-tp21216266p21216266.html
>>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Performance-testing-tp21216266p21228264.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-testing-tp21216266p21548160.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to