[ https://issues.apache.org/jira/browse/HADOOP-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Anderson updated HADOOP-4585: ---------------------------------- Description: src/contrib/ec2/bin/image/hadoop-init is appended to rc.local on all ec2 cluster boxes. This shell script generates the hadoop-site.xml configuration file. It starts with some default settings, which are used to populate the file. These defaults are then overwritten by the user data (from hadoop-ec2-env.sh) passed to the EC2 instance by launch-hadoop-master and launch-hadoop-slaves. This isn't a bug; setting variables in hadoop-ec2-env.sh does the right thing. However, it's dead and misleading code (well, it misled me) and running a test Hadoop job to figure out what's going on takes a little effort. Suggested change to hadoop-init: Remove these lines: {noformat} # set defaults MAX_TASKS=3 [ "$INSTANCE_TYPE" == "m1.large" ] && MAX_TASKS=6 [ "$INSTANCE_TYPE" == "m1.xlarge" ] && MAX_TASKS=12 MAX_MAP_TASKS=$MAX_TASKS MAX_REDUCE_TASKS=$MAX_TASKS {noformat} Add a comment before the lines which access the user data: {noformat} # get user data passed in by the ec2 instance launch wget -q -O - http://169.254.169.254/latest/user-data | tr ',' '\n' > /tmp/user-data source /tmp/user-data {noformat} was: src/contrib/ec2/bin/image/hadoop-init is appended to rc.local on all ec2 cluster boxes. This shell script generates the hadoop-site.xml configuration file. It starts with some default settings, which are used to populate the file. These defaults are then overwritten by the user data (from hadoop-ec2-env.sh) passed to the EC2 instance by launch-hadoop-master and launch-hadoop-slaves. This isn't a bug; setting variables in hadoop-ec2-env.sh does the right thing. However, it's dead and misleading code (well, it misled me) and running a test Hadoop job to figure out what's going on takes a little effort. Suggested change to hadoop-init: Remove these lines: # set defaults MAX_TASKS=3 [ "$INSTANCE_TYPE" == "m1.large" ] && MAX_TASKS=6 [ "$INSTANCE_TYPE" == "m1.xlarge" ] && MAX_TASKS=12 MAX_MAP_TASKS=$MAX_TASKS MAX_REDUCE_TASKS=$MAX_TASKS Add a comment before the lines which access the user data: # get user data passed in by the ec2 instance launch wget -q -O - http://169.254.169.254/latest/user-data | tr ',' '\n' > /tmp/user-data source /tmp/user-data > unused and misleading configuration in hadoop-init > -------------------------------------------------- > > Key: HADOOP-4585 > URL: https://issues.apache.org/jira/browse/HADOOP-4585 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/ec2 > Affects Versions: 0.18.1 > Reporter: Karl Anderson > Priority: Minor > > src/contrib/ec2/bin/image/hadoop-init is appended to rc.local on all > ec2 cluster boxes. This shell script generates the hadoop-site.xml > configuration file. It starts with some default settings, which are > used to populate the file. These defaults are then overwritten by the > user data (from hadoop-ec2-env.sh) passed to the EC2 instance by > launch-hadoop-master and launch-hadoop-slaves. > This isn't a bug; setting variables in hadoop-ec2-env.sh does the > right thing. However, it's dead and misleading code (well, it misled > me) and running a test Hadoop job to figure out what's going on takes > a little effort. > Suggested change to hadoop-init: > Remove these lines: > {noformat} > # set defaults > MAX_TASKS=3 > [ "$INSTANCE_TYPE" == "m1.large" ] && MAX_TASKS=6 > [ "$INSTANCE_TYPE" == "m1.xlarge" ] && MAX_TASKS=12 > MAX_MAP_TASKS=$MAX_TASKS > MAX_REDUCE_TASKS=$MAX_TASKS > {noformat} > Add a comment before the lines which access the user data: > {noformat} > # get user data passed in by the ec2 instance launch > wget -q -O - http://169.254.169.254/latest/user-data | tr ',' '\n' > > /tmp/user-data > source /tmp/user-data > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.