[
https://issues.apache.org/jira/browse/HADOOP-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582880#action_12582880
]
Chris K Wensel commented on HADOOP-2410:
----------------------------------------
This patch represents a fair number of changes and will need accompanying
documentation.
The typical usecase is this:
> hadoop-ec2 launch-cluster my-group 5
> hadoop-ec2 push my-group path/to/some.jar
> hadoop-ec2 login my-group
> hadoop-ec2 terminate-cluster my-group
In another window (after launch-cluster), this is quite useful, and works will
with FoxyProxy:
> hadoop-ec2 proxy my-group
There are still some rough edges I think.
> hadoop-ec2
Usage: hadoop-ec2 COMMAND
where COMMAND is one of:
list list all running Hadoop EC2 clusters
launch-cluster <group> <num slaves> launch a cluster of Hadoop EC2 instances
- launch-master then launch-slaves
launch-master <group> launch or find a cluster master
launch-slaves <group> <num slaves> launch the cluster slaves
terminate-cluster terminate all Hadoop EC2 instances
login <group|instance id> login to the master node of the Hadoop
EC2 cluster
screen <group|instance id> start or attach 'screen' on the master
node of the Hadoop EC2 cluster
proxy <group|instance id> start a socks proxy on localhost:6666
(use w/foxyproxy)
push <group> <file> scp a file to the master node of the
Hadoop EC2 cluster
<shell cmd> <group|instance id> execute any command remotely on the
master
create-image create a Hadoop AMI
> Make EC2 cluster nodes more independent of each other
> -----------------------------------------------------
>
> Key: HADOOP-2410
> URL: https://issues.apache.org/jira/browse/HADOOP-2410
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/ec2
> Affects Versions: 0.16.1
> Reporter: Tom White
> Attachments: concurrent-clusters.patch
>
>
> The cluster start up scripts currently wait for each node to start up before
> appointing a master (to run the namenode and jobtracker on), and copying
> private keys to all the nodes, and writing the private IP address of the
> master to the hadoop-site.xml file (which is then copied to the slaves via
> rsync). Only once this is all done is hadoop started on the cluster (from the
> master). This can fail if any of the nodes fails to come up, which can happen
> as EC2 doesn't guarantee that you get a cluster of the size you ask for (I've
> seen this happen).
> The process would be more robust if each node was told the address of the
> master as user metadata and then started its own daemons. This is complicated
> by the fact that the public DNS alias of the master resolves to a public IP
> address so cannot be used by EC2 nodes (see
> http://docs.amazonwebservices.com/AWSEC2/2007-08-29/DeveloperGuide/instance-addressing.html).
> Instead we need to use a trick
> (http://developer.amazonwebservices.com/connect/message.jspa?messageID=71126#71126)
> to find the private IP, and what's more we need to attempt to resolve the
> private IP in a loop until it is available since the DNS will only be set up
> after the master has started.
> This change will also mean the private key doesn't need to be copied to each
> node, which can be slow and has dubious security. Configuration can be
> handled using the mechanism described in HADOOP-2409.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.