> Yeah, I actually read all of the wiki and your article about using > Hadoop on EC2/S3 and I can't really find a reference to the S3 support > not being for "regular" S3 keys. Did I miss something or should I > update the wiki to make it more clear (or both)?
I don't think this is explained clearly enough, so please do update the wiki. Thanks. > Also, the instructions on the EC2 page on the wiki no longer work, in > that due to the kind of NAT Amazon is using, the slaves can't connect > to the master using an externally-resolved IP address via a DNS name. > What I mean is, if you set DNS to the external IP of your master > instance, your slaves can resolve that address but cannot then connect > to it. So, I had to alter the launch-hadoop-cluster and start-hadoop > scripts and merge them to just pick the master and use its EC2-given > name as the $MASTER_HOST to make it work. This sounds like the problem fixed in https://issues.apache.org/jira/browse/HADOOP-1638 in 0.14.0, which is the version you're using isn't it? Are you able to do 'bin/hadoop-ec2 launch-cluster' then (on your workstation) . bin/hadoop-ec2-env.sh ssh $SSH_OPTS "[EMAIL PROTECTED]" "sed -i -e \"s/$MASTER_HOST/\$(hostname)/g\" /usr/local/hadoop-$HADOOP_VERSION/conf/hadoop-site.xml" and then check to see if the master host has been set correctly (to the internal IP) in the master host's hadoop-site.xml. Also, what version of the EC2 tools are you using? > I also updated the scripts > to only look for a given AMI ID and only start/manage/terminate > instances of that AMI ID (since I have others I'd rather not > terminated just on the basis of their AMI launch index ;-)). Instances are terminated on the basis of their AMI ID since 0.14.0. See https://issues.apache.org/jira/browse/HADOOP-1504. Tom
