Re: Accessing S3 with Hadoop?

Toby DiPasquale Wed, 05 Sep 2007 15:39:30 -0700

On 9/5/07, Tom White <[EMAIL PROTECTED]> wrote:
> There is an open Jira issue for adding support for reading regular S3
> files (https://issues.apache.org/jira/browse/HADOOP-930) which would
> solve this problem. Until this is implemented Ahad's suggestion is a
> good workaround (you can use distcp too).


Yeah, I actually read all of the wiki and your article about using
Hadoop on EC2/S3 and I can't really find a reference to the S3 support
not being for "regular" S3 keys. Did I miss something or should I
update the wiki to make it more clear (or both)?

Also, the instructions on the EC2 page on the wiki no longer work, in
that due to the kind of NAT Amazon is using, the slaves can't connect
to the master using an externally-resolved IP address via a DNS name.
What I mean is, if you set DNS to the external IP of your master
instance, your slaves can resolve that address but cannot then connect
to it. So, I had to alter the launch-hadoop-cluster and start-hadoop
scripts and merge them to just pick the master and use its EC2-given
name as the $MASTER_HOST to make it work. I also updated the scripts
to only look for a given AMI ID and only start/manage/terminate
instances of that AMI ID (since I have others I'd rather not
terminated just on the basis of their AMI launch index ;-)).

-- 
Toby DiPasquale

Re: Accessing S3 with Hadoop?

Reply via email to