On Mon, Oct 19, 2009 at 5:34 PM, Mark Stetzer <[email protected]> wrote:
> Hi Tom,
>
> The terminate-cluster script only lists the instances that are part of
> the cluster (master and all slaves) as far as I can tell.  As an
> example, I set up a cluster of 1 master and 5 slaves, then started an
> additional non-Hadoop server via the AWS mgmt. console running a
> completely different AMI (OpenSolaris 2009.06 just to be very
> different).  terminate-cluster only listed the 6 instances that were
> part of the cluster if I remember correctly.
>
> I have 4 security groups:  default, default-master, default-slave, and
> mark-default.  mark-default wasn't even added until after I started
> the Hadoop cluster; I added it to log in to the OpenSolaris instance.

I think there is a bug here. I've filed
https://issues.apache.org/jira/browse/HADOOP-6320. As an immediate
workaround you can avoid calling the Hadoop cluster "default", and
make sure that you don't create non-Hadoop EC2 instances in the
cluster group.

Thanks,
Tom

>
> Does this help at all?  Thanks.
>
> -Mark
>
> On Mon, Oct 19, 2009 at 11:52 AM, Tom White <[email protected]> wrote:
>> Hi Mark,
>>
>> Sorry to hear that all your EC2 instances were terminated. Needless to
>> say, this should certainly not happen.
>>
>> The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
>> HADOOP-1504 is not applicable, but the behaviour should be the same:
>> the terminate-cluster command lists the instances that it will
>> terminate, and prompts for confirmation that they should be
>> terminated. Is it listing instances that are not in the cluster? I
>> have used this script a lot and it has never terminated any instances
>> that are not in the cluster.
>>
>> What are the names of the security groups that the instances are in
>> (both those in the cluster, and those outside the cluster that are
>> inadvertently terminated)?
>>
>> Thanks,
>> Tom
>>
>> On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer <[email protected]> wrote:
>>> Hey all,
>>>
>>> While running the (latest as of Friday) Cloudera-created EC2 scripts,
>>> I noticed that running the terminate-cluster script kills ALL of your
>>> EC2 nodes, not just those associated with the cluster.  This has been
>>> documented before in HADOOP-1504
>>> (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
>>> integrated way back on June 21, 2007.  My questions are:
>>>
>>> 1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
>>> AND
>>> 2)  Is this a regression in the common code, a problem with the
>>> Cloudera scripts, or just user error on my part?
>>>
>>> Just trying to get to the bottom of this so no one else has to see all
>>> of their EC2 instances die accidentally :(
>>>
>>> Thanks!
>>>
>>> -Mark
>>>
>>
>

Reply via email to