Andrew,

The public dns hostname for ec2 instances follows a namining convention based on the ip address and the ec2 internal dns system automatically translates lookups of the public dns hostnames to the internal ip. So if you assign an elastic ip to a server, then you may be able to use the public dns hostname in your config and avoid the data transfer fee.

--
Thanks,

Charles Woerner

On Mar 13, 2010, at 12:32 PM, Andrew Purtell <apurt...@apache.org> wrote:

The data will be intact, but the config will be invalidated, right?

After a cluster has been suspended and then resumed, all of the assigned IP addresses will be different. So this would render all of the Hadoop and HBase configuration files invalid. The data will be there but you will have
to go fix up all of the config files on all of your instances, somehow
accounting for which is master, which is slave, which is zookeeper. Elastic IPs might help, but I wouldn't use them because while instance-to- instance data transfers are free, that is NOT the case when elastic IPs are used for
internal traffic.


This could be automated. You can track the roles of the instances locally, make a local "suspend script" which shuts Hadoop and HBase down before you suspend the cluster, and make a local "resume script" which remembers the
role of each instance, logs on to the instance after it has been
reactivated, performs the appropriate substitutions on Hadoop and HBase
config files, and then restarts the daemons.

Taking this further:

HBase is almost free of static configuration: The master and the slaves need to know the network locations of the ZooKeeper quorum ensemble peers. The master needs to know the network location of the HDFS NameNode. At some future time if an option for Hadoop configuration hosting in ZooKeeper is developed, then the HBase master could learn the address of the NameNode from ZK. Presumably the HDFS DataNodes would do the same, and so the only
static detail for everything would be the network location of the ZK
ensemble peers. At this point you could write them as DNS hostnames and
then dynamically update DNS instead of performing a bunch of fixups on
config files.

   - Andy


----- Original Message ----
From: Jonathan Gray <jl...@streamy.com>
To: hbase-user@hadoop.apache.org
Sent: Sat, March 13, 2010 10:13:08 AM
Subject: RE: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Prasen,

You could definitely do something like that. As long as you keep everything for your Hadoop/HBase setup to use EBS volumes, you should be able to spin the cluster down, turn off the nodes, and then bring them back up at a later
time with all the data still intact.

JG


-----Original Message-----
On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
I agree that running 24/7 hbase servers on ec2 is not advisable. But I need some suggestions for running mapred-jobs ( in batches ) followed
by updating the results on an existing hbase server.




Reply via email to