Hi Toby / Tom, The failure to copy during the reduce phase is one of the side affects of the bug specified in HADOOP-1783. You should apply Tom's patch and then see if the failure condition still exists.
Thanks, Ahad. On 9/6/07, Toby DiPasquale <[EMAIL PROTECTED]> wrote: > > On 9/6/07, Tom White <[EMAIL PROTECTED]> wrote: > > > Yeah, I actually read all of the wiki and your article about using > > > Hadoop on EC2/S3 and I can't really find a reference to the S3 support > > > not being for "regular" S3 keys. Did I miss something or should I > > > update the wiki to make it more clear (or both)? > > > > I don't think this is explained clearly enough, so please do update > > the wiki. Thanks. > > I just updated the page to add a Notes section explaining the issue > and referencing the JIRA issue # you mentioned earlier. > > > > Also, the instructions on the EC2 page on the wiki no longer work, in > > > that due to the kind of NAT Amazon is using, the slaves can't connect > > > to the master using an externally-resolved IP address via a DNS name. > > > What I mean is, if you set DNS to the external IP of your master > > > instance, your slaves can resolve that address but cannot then connect > > > to it. So, I had to alter the launch-hadoop-cluster and start-hadoop > > > scripts and merge them to just pick the master and use its EC2-given > > > name as the $MASTER_HOST to make it work. > > > > This sounds like the problem fixed in > > https://issues.apache.org/jira/browse/HADOOP-1638 in 0.14.0, which is > > the version you're using isn't it? > > > > Are you able to do 'bin/hadoop-ec2 launch-cluster' then (on your > workstation) > > > > . bin/hadoop-ec2-env.sh > > ssh $SSH_OPTS "[EMAIL PROTECTED]" "sed -i -e > > \"s/$MASTER_HOST/\$(hostname)/g\" > > /usr/local/hadoop-$HADOOP_VERSION/conf/hadoop-site.xml" > > > > and then check to see if the master host has been set correctly (to > > the internal IP) in the master host's hadoop-site.xml. > > Well, no, since my $MASTER_HOST is now just the external DNS name of > the first instance started in the reservation, but this is performed > as part of my launch-hadoop-cluster script. In any case, that value is > not set to the internal IP, but rather to the hostname portion of the > internal DNS name. > > Currently, my MR jobs are failing because the reducers can't copy the > map output and I'm thinking it might be because there is some kind of > external address getting in there somehow. I see connections to > external IPs in netstat -tan (72.* addresses). Any ideas about that? > In the hadoop-site.xml's on the slaves, the address is the external > DNS name of the master (ec2-*) but that resolves to the internal 10/8 > address like it should. > > > Also, what version of the EC2 tools are you using? > > black:~/code/hadoop-0.14.0/src/contrib/ec2> ec2-version > 1.2-11797 2007-03-01 > black:~/code/hadoop-0.14.0/src/contrib/ec2> > > > > I also updated the scripts > > > to only look for a given AMI ID and only start/manage/terminate > > > instances of that AMI ID (since I have others I'd rather not > > > terminated just on the basis of their AMI launch index ;-)). > > > > Instances are terminated on the basis of their AMI ID since 0.14.0. > > See https://issues.apache.org/jira/browse/HADOOP-1504. > > I felt this was unsafe as it was, since it looked for a name of an > image and then reversed it to the AMI ID. I just hacked it so you have > to put in the AMI ID in hadoop-ec2-env.sh. Also, the script as it is > right now doesn't grep for 'running' so may potentially shut down some > instances starting up in another cluster. I may just be paranoid, > however ;) > > -- > Toby DiPasquale >
