Gian Lorenzo Thione wrote:
At Powerset we have used EC2 and Hadoop with a large number of nodes,
successfully running Map/Reduce computations and HDFS. Pretty much like you
describe, we use HDFS for intermediate results and caching, and periodically
extract data to our local network. We are not really using S3 at the moment
for persistent storage.

Why don't you use S3 for persistent storage? It would seem more economical to keep things there, since transfers to and from S3 are free, while transferring offsite is rather costly.

A nice feature of Hadoop as measured against our use of EC2 has been the
capability of fluidly changing the number of instances that are part of the
cluster. Our instances are set up to join the cluster and the DFS as soon as
they are activated and when - for any reason - we lose those machines, the
overall process doesn't suffer. We have been quite happy with this, even at
significant number of instances.

Great to hear!

It would be useful to hear more about how you build your images. If possible, can you share this on the Hadoop wiki, to provide a reference for others?

As a byproduct of running these experiments, we have implemented some
patches to Hadoop to report to the master IP's and hostnames that are
different than the default ones assigned by InetAddress's static method
(getLocalHost). This is due to the fact that machines sometimes are assigned
to specific networks to deal with firewalls in various ways so we may want
to report Ips to the JobTracker from different interfaces, in order for the
tracker to contact them back. In our model the interface is specified as
part of the configuration parameters.

Is this something that the Hadoop project would be interested to
incorporate?

Yes, please. EC2 seems like a facility that we'd like Hadoop to work well on. It is great resource for folks who don't have the means to build and operate their own clusters, but do sometimes need such large-scale infrastructure, for, e.g., experiments, research, testing, etc.

Doug

Reply via email to