Andrew,
The public dns hostname for ec2 instances follows a namining
convention based on the ip address and the ec2 internal dns system
automatically translates lookups of the public dns hostnames to the
internal ip. So if you assign an elastic ip to a server, then you may
be able to use the public dns hostname in your config and avoid the
data transfer fee.
--
Thanks,
Charles Woerner
On Mar 13, 2010, at 12:32 PM, Andrew Purtell <apurt...@apache.org>
wrote:
The data will be intact, but the config will be invalidated, right?
After a cluster has been suspended and then resumed, all of the
assigned IP
addresses will be different. So this would render all of the Hadoop
and
HBase configuration files invalid. The data will be there but you
will have
to go fix up all of the config files on all of your instances, somehow
accounting for which is master, which is slave, which is zookeeper.
Elastic
IPs might help, but I wouldn't use them because while instance-to-
instance
data transfers are free, that is NOT the case when elastic IPs are
used for
internal traffic.
This could be automated. You can track the roles of the instances
locally,
make a local "suspend script" which shuts Hadoop and HBase down
before you
suspend the cluster, and make a local "resume script" which
remembers the
role of each instance, logs on to the instance after it has been
reactivated, performs the appropriate substitutions on Hadoop and
HBase
config files, and then restarts the daemons.
Taking this further:
HBase is almost free of static configuration: The master and the
slaves
need to know the network locations of the ZooKeeper quorum ensemble
peers.
The master needs to know the network location of the HDFS NameNode.
At some
future time if an option for Hadoop configuration hosting in
ZooKeeper is
developed, then the HBase master could learn the address of the
NameNode
from ZK. Presumably the HDFS DataNodes would do the same, and so the
only
static detail for everything would be the network location of the ZK
ensemble peers. At this point you could write them as DNS hostnames
and
then dynamically update DNS instead of performing a bunch of fixups on
config files.
- Andy
----- Original Message ----
From: Jonathan Gray <jl...@streamy.com>
To: hbase-user@hadoop.apache.org
Sent: Sat, March 13, 2010 10:13:08 AM
Subject: RE: on Hadoop reliability wrt. EC2 (was: Re:
[databasepro-48] HUG9)
Prasen,
You could definitely do something like that. As long as you keep
everything
for your Hadoop/HBase setup to use EBS volumes, you should be able
to spin
the cluster down, turn off the nodes, and then bring them back up at
a later
time with all the data still intact.
JG
-----Original Message-----
On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
I agree that running 24/7 hbase servers on ec2 is not advisable.
But I
need some suggestions for running mapred-jobs ( in batches )
followed
by updating the results on an existing hbase server.