I look into different cluster and configurations from cloudera and came with 
this number let me know what do you think...

Machine 

23 GB of memory

33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)

1690 GB of instance storage

64-bit platform

I/O Performance: Very High (10 Gigabit Ethernet)

API name: cc1.4xlarge

    MAX_MAP_TASKS=16 -  mapred.tasktracker.map.tasks.maximum
    MAX_REDUCE_TASKS=8 - mapred.tasktracker.reduce.tasks.maximum
    CHILD_OPTS=-Xmx1024m - mapred.child.java.opts
    CHILD_ULIMIT=1392640 - mapred.child.ulimit

Regards,
Aleksandr

--- On Tue, 5/24/11, Aleksandr Elbakyan <[email protected]> wrote:

From: Aleksandr Elbakyan <[email protected]>
Subject: EC2 cloudera cc1.4xlarge
To: [email protected]
Date: Tuesday, May 24, 2011, 4:23 PM

Hello,

I am want to use cc1.4xlarge cluster for some data processing, to spin clusters 
I am using cloudera scripts. hadoop-ec2-init-remote.sh has default 
configuration until c1.xlarge but not configuration for cc1.4xlarge, can 
someone give formula how does this values calculated based on hardware?

C1.XLARGE
    MAX_MAP_TASKS=8 -  mapred.tasktracker.map.tasks.maximum
    MAX_REDUCE_TASKS=4 - mapred.tasktracker.reduce.tasks.maximum
    CHILD_OPTS=-Xmx680m - mapred.child.java.opts
    CHILD_ULIMIT=1392640 - mapred.child.ulimit

I am guessing but I think 

CHILD_OPTS = (total ram on the box - 1gb) /(MAX_MAP_TASKS, MAX_REDUCE_TASKS)

But not sure how to calculate rest

Regards,
Aleksandr 

 

Reply via email to