My second question is about the ec2 machines has anyone solved the hostname problem in a automated way?

Example if I launch a ec2 server to run a task tracker the hostname reported back to my local cluster with its internal address the local reduce task can not access the map files on the ec2 machine because with the default hostname.
I get a error:
WARN org.apache.hadoop.mapred.ReduceTask: java.net.UnknownHostException: domU-12-31-39-00-A4-05.compute-1.internal

<question>
Is there a automated way to start a tasktracker on a ec2 machine with it useing the public hostname so the local task can get the maps from the ec2 machines?
example something like
bin/hadoop-daemon.sh start tasktracker host=ec2-xx-xx-xx-xx.z-2.compute-1.amazonaws.com

That I can run to start just the tasktracker with the correct hostname
</question>

What I am trying to do is build a custom ami image that I can just launch when need to add extra cpu power to my cluster and to automatically start
the tasktracker vi a shell script that can be ran at startup.

Billy


"Billy Pearson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
I have a question someone may have answered here before but I can not find the answer.

Assuming I have a cluster of servers hosting a large amount of data
I want to run a large job that the maps take a lot of cpu power to run and the reduces only take a small amount cpu to run. I want to run the maps on a group of EC2 servers and run the reduces on the local cluster of 10 machines.

The problem I am seeing is the map outputs, if I run the maps on EC2 they are stored local on the instance What I am looking to do is have the map output files stored in hdfs so I can kill the EC2 instances sense I do not need them for the reduces.

The only way I can thank to do this is run two jobs one maper and store the output on hdfs and then run a second job to run the reduces
from the map outputs store on the hfds.

Is there away to make the mappers store the final output in hdfs?




Reply via email to