On 9/5/07, Tom White <[EMAIL PROTECTED]> wrote: > There is an open Jira issue for adding support for reading regular S3 > files (https://issues.apache.org/jira/browse/HADOOP-930) which would > solve this problem. Until this is implemented Ahad's suggestion is a > good workaround (you can use distcp too).
Yeah, I actually read all of the wiki and your article about using Hadoop on EC2/S3 and I can't really find a reference to the S3 support not being for "regular" S3 keys. Did I miss something or should I update the wiki to make it more clear (or both)? Also, the instructions on the EC2 page on the wiki no longer work, in that due to the kind of NAT Amazon is using, the slaves can't connect to the master using an externally-resolved IP address via a DNS name. What I mean is, if you set DNS to the external IP of your master instance, your slaves can resolve that address but cannot then connect to it. So, I had to alter the launch-hadoop-cluster and start-hadoop scripts and merge them to just pick the master and use its EC2-given name as the $MASTER_HOST to make it work. I also updated the scripts to only look for a given AMI ID and only start/manage/terminate instances of that AMI ID (since I have others I'd rather not terminated just on the basis of their AMI launch index ;-)). -- Toby DiPasquale
