Re: [MarkLogic Dev General] EC2 AMI & ML

Steiner, David J. (LNG-DAY) Tue, 09 Aug 2011 05:56:19 -0700

Thanks Mike and David - this is good information.

Basically, what I'm involved in is simply trying to demonstrate what would be 
possible, so I wanted to make sure that attempting to exercise the features of 
ML in the AWS arena.  Sort of like a proving ground where, for example, I may 
want to demonstrate the failover functionality works (and simply noting the 
shortcomings of AWS (i.e., latency) which be solved through a "more production" 
implementation within the data center.

It sounds like, if I've understood correctly, that there shouldn't be anything 
that prevents me from implementing any feature of ML in AWS, but there isn't 
much experience with how well those features might perform.

Kindest Regards,
David

-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Lee, David
Sent: Monday, August 08, 2011 11:18 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] EC2 AMI & ML

To achieve really good HA on AWS you need your instances scattered across 
multiple AWS geographical regions.
Unfortunately that also induces latency.   I have no idea how ML Clusters 
behave in situations like that of high latency.

As a anecdotal metric,  the sites that survived the amazon crash in April were 
ones that were fault tolerant across multiple geographical regions (or were 
simply by chance not  in the US-EAST region).  And were load balanced or at 
least fail-over balanced across those regions.    HA and scaling in the cloud 
is a different beast then in a datacenter.
On the other hand, I believe it has potential to be vastly superior if 
approached correctly.   Very few organizations can provide a multi-geographical 
region datacenter of cooperative VM instances with shared distributed 
persistent storage across the regions.

The sites that failed in April in AWS were those who didnt "RTFM".   That is 
they didnt architect around failure.  They just assumed it wouldn't happen to 
them or had an incorrect presumption of what could fail and didnt architect for 
it.
That said, it's not easy.

Failure is the new success.
http://blog.calldei.com/2011/04/failure-is-new-success.html

----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
d...@epocrates.com
812-482-5224

-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
Sent: Monday, August 08, 2011 8:30 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] EC2 AMI & ML

So you are looking for an HA solution? The strategy you describe - forest 
replication on EBS - should function inside AWS. But it may not perform very 
well, and may not satisfy your HA requirements.

HA is a tricky business in AWS, since the environment is not under direct 
control. For any potential solution, ask questions. What happens when an 
instance freezes up and won't continue or terminate? What happens if two 
instances end up on the same underlying hardware, and that hardware fails? What 
happens if EBS itself fails, as it did on 21 April? What happens if S3 fails?

The AWS environment is also quite a bit different than an enterprise data 
center. An EBS volume has a disk-like interface and is redundant like a RAID-5 
LUN, but it doesn't behave exactly like either. Getting good performance from 
EBS can be challenging. Instances seem like hosts, but sometimes they don't act 
quite like hosts do. Even the network has its quirks.

I don't have any firm recommendations for AWS and HA to share right now, but my 
work so far suggests a range of options for different HA requirements and 
performance requirements. As in a data center, three nines looks fairly easy 
and there are a few plausible strategies. Four nines looks like a steeper 
challenge. In both cases, the solutions aren't quite the same as the ones I 
would propose for an enterprise data center. 

-- Mike

On 8 Aug 2011, at 12:18 , Steiner, David J. (LNG-DAY) wrote:

> Hi Mike,
> 
> So, if I wanted to do something like create a 3 node cluster for 5TB of data 
> where each of the nodes is a failover for one of the other nodes, for 
> instance.
> I would create the nodes(using an appropriate 64-bit Linux OS AMI) and then 
> attach enough EBS storage to the 3 nodes - maybe 5TB each.
> And when I install MarkLogic, would all of the installed software go on the 
> EBS storage?  And how would I ensure that?  Or would there be a reason for 
> the /opt/MarkLogic to go to the "local" storage and the /var/opt/MarkLogic to 
> go to EBS storage?
> 
> If I do install it this way, and copy 1/3 of my data to each node (ML host), 
> will the ML replication work?
> 
> Thanks,
> David
> 
> -----Original Message-----
> From: general-boun...@developer.marklogic.com 
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
> Sent: Monday, August 08, 2011 2:44 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] EC2 AMI & ML
> 
> David, I think the "local instance storage" limit is an AWS limit specific to 
> the instance type you might select: m1.xlarge has 850-GB, etc 
> (http://aws.amazon.com/ec2/instance-types/), not to either server license. 
> You can always connect more storage using EBS, though. EBS acts something 
> like NAS or SAN storage, and can persist across instance stop-start cycles 
> and can be moved from instance to instance within an availability zone, which 
> can be useful. Instance storage (aka ephemeral storage) is more like local 
> disk. Its performance can be more predictable, but it isn't as flexible as 
> EBS. If you want 5-TB of storage on one host, I think you'd have to use EBS 
> (http://aws.amazon.com/ebs/).
> 
> -- Mike
> 
> On 8 Aug 2011, at 11:28 , Steiner, David J. (LNG-DAY) wrote:
> 
>> Just to make sure I understand correctly...
>> 
>> If I want to use MarkLogic in EC2, then I can either pick the pay-per-use 
>> license or community license, both of which have restrictions on content 
>> size, i.e., where the most amount of storage I can get is: 1690 GB of local 
>> instance storage.  Is this correct, or am I misunderstanding "local instance 
>> storage" to be the total storage available for data in the ML database?
>> 
>> I'm assuming there is a limit, so alternatively, if I want to use the 
>> license that I already have, so that I can create a ML cluster that holds 
>> 5TBs of data, for instance, then I need to use an AMI that just contains an 
>> operating system and I would then have to install MarkLogic on it myself so 
>> that I can enter the license key.
>> 
>> The documentation at: 
>> http://developer.marklogic.com/pubs/4.2/books/ec2.pdfonly covers using the 
>> established ML AMIs and not what to do if you want to use your existing 
>> license.
>> 
>> A search on the EC2 AMI page on "RightScale CentOS Linux" which is what is 
>> mentioned in the guide produces 19 results.  So, is there a recommended AMI 
>> (or AMIs) to pick?
>> 
>> Thanks,
>> David
>> 
>> 
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] EC2 AMI & ML

Reply via email to