I look into different cluster and configurations from cloudera and came with this number let me know what do you think...
Machine 23 GB of memory 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture) 1690 GB of instance storage 64-bit platform I/O Performance: Very High (10 Gigabit Ethernet) API name: cc1.4xlarge MAX_MAP_TASKS=16 - mapred.tasktracker.map.tasks.maximum MAX_REDUCE_TASKS=8 - mapred.tasktracker.reduce.tasks.maximum CHILD_OPTS=-Xmx1024m - mapred.child.java.opts CHILD_ULIMIT=1392640 - mapred.child.ulimit Regards, Aleksandr --- On Tue, 5/24/11, Aleksandr Elbakyan <[email protected]> wrote: From: Aleksandr Elbakyan <[email protected]> Subject: EC2 cloudera cc1.4xlarge To: [email protected] Date: Tuesday, May 24, 2011, 4:23 PM Hello, I am want to use cc1.4xlarge cluster for some data processing, to spin clusters I am using cloudera scripts. hadoop-ec2-init-remote.sh has default configuration until c1.xlarge but not configuration for cc1.4xlarge, can someone give formula how does this values calculated based on hardware? C1.XLARGE MAX_MAP_TASKS=8 - mapred.tasktracker.map.tasks.maximum MAX_REDUCE_TASKS=4 - mapred.tasktracker.reduce.tasks.maximum CHILD_OPTS=-Xmx680m - mapred.child.java.opts CHILD_ULIMIT=1392640 - mapred.child.ulimit I am guessing but I think CHILD_OPTS = (total ram on the box - 1gb) /(MAX_MAP_TASKS, MAX_REDUCE_TASKS) But not sure how to calculate rest Regards, Aleksandr
