Re: Accessing S3 with Hadoop?

Tom White Thu, 06 Sep 2007 01:33:21 -0700

> Yeah, I actually read all of the wiki and your article about using
> Hadoop on EC2/S3 and I can't really find a reference to the S3 support
> not being for "regular" S3 keys. Did I miss something or should I
> update the wiki to make it more clear (or both)?


I don't think this is explained clearly enough, so please do update
the wiki. Thanks.

> Also, the instructions on the EC2 page on the wiki no longer work, in
> that due to the kind of NAT Amazon is using, the slaves can't connect
> to the master using an externally-resolved IP address via a DNS name.
> What I mean is, if you set DNS to the external IP of your master
> instance, your slaves can resolve that address but cannot then connect
> to it. So, I had to alter the launch-hadoop-cluster and start-hadoop
> scripts and merge them to just pick the master and use its EC2-given
> name as the $MASTER_HOST to make it work.

This sounds like the problem fixed in
https://issues.apache.org/jira/browse/HADOOP-1638 in 0.14.0, which is
the version you're using isn't it?

Are you able to do 'bin/hadoop-ec2 launch-cluster' then (on your workstation)

. bin/hadoop-ec2-env.sh
ssh $SSH_OPTS "[EMAIL PROTECTED]" "sed -i -e
\"s/$MASTER_HOST/\$(hostname)/g\"
/usr/local/hadoop-$HADOOP_VERSION/conf/hadoop-site.xml"

and then check to see if the master host has been set correctly (to
the internal IP) in the master host's hadoop-site.xml.

Also, what version of the EC2 tools are you using?

> I also updated the scripts
> to only look for a given AMI ID and only start/manage/terminate
> instances of that AMI ID (since I have others I'd rather not
> terminated just on the basis of their AMI launch index ;-)).

Instances are terminated on the basis of their AMI ID since 0.14.0.
See https://issues.apache.org/jira/browse/HADOOP-1504.

Tom

Re: Accessing S3 with Hadoop?

Reply via email to