Re: FW: How to run multiple data nodes and multiple task trackers on single server.

Xiaobo Gu Mon, 04 Jul 2011 20:34:44 -0700

Hi Kumar,

Thanks for your reply, there are tow fundamental questions:
1. What are the advantages in running multiple data nodes on a single
computer, because on data node instance can read multiple data
directories too?
2. Are there advantages to run multiple task trackers, I think running
multiple task trackers is to use more CPU cores, can larger number
such as 4 or 8 for mapred.tasktracker.reduce.tasks.maximum and
mapred.tasktracker.map.tasks.maximum with only one task tracker
achieve the same purpose?


Regards,

Xiaobo Gu

On Mon, Jul 4, 2011 at 12:34 PM, Kumar Kandasami
<kumaravel.kandas...@gmail.com> wrote:
> Xiaobo Gu:
>
>    Being you are trying an approach that is not something commonly done,
> I've recommended someways that I'll take to approach this problem -
>
> Experiment with different configuration settings, and see the performance of
> the cluster using a Terabyte sort job submitted to the cluster. So you have
> a bench mark to see whether the cluster performance has improved by the
> change.
>
> I am not sure whether you need multiple Task tracker being that you are
> running multiple data nodes on the same machine.
>
> mapred.tasktracker.map.tasks.maximum attribute is going to open as many
> separate JVMs to run the map tasks.
> memory allocated for these each task can be controlled by the
> mapred.child.java.opts.
>
> So say if you have 32 processors - then generally you can run 64 JVM
> process, allocating one per the hadoop demons you have
>
> SecondaryNameNode - 1
> NameNode - 1
> JobTracker - 1
> DataNode - 8
> TaskTracker - 1
>
>
> you have 52 maximum JVM process you could allocate between Map & Reduce
> tasks. Approximately you could allocate a max of 3-4 GB per map/reduce task
> (if needed).
>
> Concern: more number of map/reduce tasks might cause bottle neck to the disk
> access, it might depend on the HDD configuration too.
>
> Hope these comments help you succeed further. Please keep me posted with
> your benchmark results and configuration.
>
>
>
>
> Kumar    _/|\_
> www.saisk.com
> ku...@saisk.com
> "making a profound difference with knowledge and creativity..."
>
>
> On Sun, Jul 3, 2011 at 10:30 PM, Xiaobo Gu <guxiaobo1...@gmail.com> wrote:
>>
>> Hi Kumar,
>>
>> Thanks for your reply, can  you have a look at this please.
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Xiaobo Gu <guxiaobo1...@gmail.com>
>> Date: Sun, Jul 3, 2011 at 10:07 PM
>> Subject: Re: FW: How to run multiple data nodes and multiple task
>> trackers on single server.
>> To: mapreduce-user@hadoop.apache.org
>>
>>
>> Hi Harsh
>> I have successfully running 2 data nodes in a single vitual machine,
>> and we will depoly 4 or 8 data nodes on our big SMP server, which has
>> 32 CPU cores and 256G RAM,in order to take full advantage of all the
>> resources, do we need to configure more task trackers too, or can we
>> set mapred.tasktracker.map.tasks.maximum and
>> mapred.tasktracker.reduce.tasks.maximum to a larger number such as 8
>> or 16 to achieve the same purpose?
>>
>> We have seen this
>> http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#21, but
>> have not get any more details, we think multiple data node
>> configuration on big SMP servers is a good point to start with.
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> On Sun, Jul 3, 2011 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote:
>> > On Sun, Jul 3, 2011 at 9:41 AM, XiaoboGu <guxiaobo1...@gmail.com> wrote:
>> >>> Hi,
>> >>>
>> >>> Do we have to run multiple task trackers when running multiple data
>> >>> nodes on a single
>> >>> computer?
>> >>>
>> >>> Regards,
>> >>>
>> >>> Xiaobo Gu
>> >>>
>> >
>> > Do we _have_ to? --> No, its a matter of your choice if you want
>> > MapReduce daemons running along. They are not coupled.
>> >
>> > Regd. your original question, what's the string of "$DN_CONF_OPTS" being
>> > passed?
>> >
>> > --
>> > Harsh J
>> >
>
>

Re: FW: How to run multiple data nodes and multiple task trackers on single server.

Reply via email to