On Mon, Oct 19, 2009 at 5:34 PM, Mark Stetzer <[email protected]> wrote: > Hi Tom, > > The terminate-cluster script only lists the instances that are part of > the cluster (master and all slaves) as far as I can tell. As an > example, I set up a cluster of 1 master and 5 slaves, then started an > additional non-Hadoop server via the AWS mgmt. console running a > completely different AMI (OpenSolaris 2009.06 just to be very > different). terminate-cluster only listed the 6 instances that were > part of the cluster if I remember correctly. > > I have 4 security groups: default, default-master, default-slave, and > mark-default. mark-default wasn't even added until after I started > the Hadoop cluster; I added it to log in to the OpenSolaris instance.
I think there is a bug here. I've filed https://issues.apache.org/jira/browse/HADOOP-6320. As an immediate workaround you can avoid calling the Hadoop cluster "default", and make sure that you don't create non-Hadoop EC2 instances in the cluster group. Thanks, Tom > > Does this help at all? Thanks. > > -Mark > > On Mon, Oct 19, 2009 at 11:52 AM, Tom White <[email protected]> wrote: >> Hi Mark, >> >> Sorry to hear that all your EC2 instances were terminated. Needless to >> say, this should certainly not happen. >> >> The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so >> HADOOP-1504 is not applicable, but the behaviour should be the same: >> the terminate-cluster command lists the instances that it will >> terminate, and prompts for confirmation that they should be >> terminated. Is it listing instances that are not in the cluster? I >> have used this script a lot and it has never terminated any instances >> that are not in the cluster. >> >> What are the names of the security groups that the instances are in >> (both those in the cluster, and those outside the cluster that are >> inadvertently terminated)? >> >> Thanks, >> Tom >> >> On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer <[email protected]> wrote: >>> Hey all, >>> >>> While running the (latest as of Friday) Cloudera-created EC2 scripts, >>> I noticed that running the terminate-cluster script kills ALL of your >>> EC2 nodes, not just those associated with the cluster. This has been >>> documented before in HADOOP-1504 >>> (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was >>> integrated way back on June 21, 2007. My questions are: >>> >>> 1) Is anyone else seeing this? I can reproduce this behavior consistently. >>> AND >>> 2) Is this a regression in the common code, a problem with the >>> Cloudera scripts, or just user error on my part? >>> >>> Just trying to get to the bottom of this so no one else has to see all >>> of their EC2 instances die accidentally :( >>> >>> Thanks! >>> >>> -Mark >>> >> >
