Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Charles Woerner / IMAP Sat, 13 Mar 2010 14:13:12 -0800

Andrew,

The public dns hostname for ec2 instances follows a naminingconvention based on the ip address and the ec2 internal dns systemautomatically translates lookups of the public dns hostnames to theinternal ip. So if you assign an elastic ip to a server, then you maybe able to use the public dns hostname in your config and avoid thedata transfer fee.


--
Thanks,

Charles Woerner

On Mar 13, 2010, at 12:32 PM, Andrew Purtell <apurt...@apache.org>wrote:

The data will be intact, but the config will be invalidated, right?
After a cluster has been suspended and then resumed, all of theassigned IPaddresses will be different. So this would render all of the HadoopandHBase configuration files invalid. The data will be there but youwill have
to go fix up all of the config files on all of your instances, somehow
accounting for which is master, which is slave, which is zookeeper.ElasticIPs might help, but I wouldn't use them because while instance-to-instancedata transfers are free, that is NOT the case when elastic IPs areused for
internal traffic.
This could be automated. You can track the roles of the instanceslocally,make a local "suspend script" which shuts Hadoop and HBase downbefore yoususpend the cluster, and make a local "resume script" whichremembers the
role of each instance, logs on to the instance after it has been
reactivated, performs the appropriate substitutions on Hadoop andHBase
config files, and then restarts the daemons.

Taking this further:
HBase is almost free of static configuration: The master and theslavesneed to know the network locations of the ZooKeeper quorum ensemblepeers.The master needs to know the network location of the HDFS NameNode.At somefuture time if an option for Hadoop configuration hosting inZooKeeper isdeveloped, then the HBase master could learn the address of theNameNodefrom ZK. Presumably the HDFS DataNodes would do the same, and so theonly
static detail for everything would be the network location of the ZK
ensemble peers. At this point you could write them as DNS hostnamesand
then dynamically update DNS instead of performing a bunch of fixups on
config files.

   - Andy


----- Original Message ----
From: Jonathan Gray <jl...@streamy.com>
To: hbase-user@hadoop.apache.org
Sent: Sat, March 13, 2010 10:13:08 AM
Subject: RE: on Hadoop reliability wrt. EC2 (was: Re:[databasepro-48] HUG9)
Prasen,
You could definitely do something like that. As long as you keepeverythingfor your Hadoop/HBase setup to use EBS volumes, you should be ableto spinthe cluster down, turn off the nodes, and then bring them back up ata later
time with all the data still intact.

JG
-----Original Message-----
On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
I agree that running 24/7 hbase servers on ec2 is not advisable.But Ineed some suggestions for running mapred-jobs ( in batches )followed
by updating the results on an existing hbase server.

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Reply via email to