Thanks Mike and David - this is good information. Basically, what I'm involved in is simply trying to demonstrate what would be possible, so I wanted to make sure that attempting to exercise the features of ML in the AWS arena. Sort of like a proving ground where, for example, I may want to demonstrate the failover functionality works (and simply noting the shortcomings of AWS (i.e., latency) which be solved through a "more production" implementation within the data center.
It sounds like, if I've understood correctly, that there shouldn't be anything that prevents me from implementing any feature of ML in AWS, but there isn't much experience with how well those features might perform. Kindest Regards, David -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Lee, David Sent: Monday, August 08, 2011 11:18 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] EC2 AMI & ML To achieve really good HA on AWS you need your instances scattered across multiple AWS geographical regions. Unfortunately that also induces latency. I have no idea how ML Clusters behave in situations like that of high latency. As a anecdotal metric, the sites that survived the amazon crash in April were ones that were fault tolerant across multiple geographical regions (or were simply by chance not in the US-EAST region). And were load balanced or at least fail-over balanced across those regions. HA and scaling in the cloud is a different beast then in a datacenter. On the other hand, I believe it has potential to be vastly superior if approached correctly. Very few organizations can provide a multi-geographical region datacenter of cooperative VM instances with shared distributed persistent storage across the regions. The sites that failed in April in AWS were those who didnt "RTFM". That is they didnt architect around failure. They just assumed it wouldn't happen to them or had an incorrect presumption of what could fail and didnt architect for it. That said, it's not easy. Failure is the new success. http://blog.calldei.com/2011/04/failure-is-new-success.html ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. d...@epocrates.com 812-482-5224 -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley Sent: Monday, August 08, 2011 8:30 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] EC2 AMI & ML So you are looking for an HA solution? The strategy you describe - forest replication on EBS - should function inside AWS. But it may not perform very well, and may not satisfy your HA requirements. HA is a tricky business in AWS, since the environment is not under direct control. For any potential solution, ask questions. What happens when an instance freezes up and won't continue or terminate? What happens if two instances end up on the same underlying hardware, and that hardware fails? What happens if EBS itself fails, as it did on 21 April? What happens if S3 fails? The AWS environment is also quite a bit different than an enterprise data center. An EBS volume has a disk-like interface and is redundant like a RAID-5 LUN, but it doesn't behave exactly like either. Getting good performance from EBS can be challenging. Instances seem like hosts, but sometimes they don't act quite like hosts do. Even the network has its quirks. I don't have any firm recommendations for AWS and HA to share right now, but my work so far suggests a range of options for different HA requirements and performance requirements. As in a data center, three nines looks fairly easy and there are a few plausible strategies. Four nines looks like a steeper challenge. In both cases, the solutions aren't quite the same as the ones I would propose for an enterprise data center. -- Mike On 8 Aug 2011, at 12:18 , Steiner, David J. (LNG-DAY) wrote: > Hi Mike, > > So, if I wanted to do something like create a 3 node cluster for 5TB of data > where each of the nodes is a failover for one of the other nodes, for > instance. > I would create the nodes(using an appropriate 64-bit Linux OS AMI) and then > attach enough EBS storage to the 3 nodes - maybe 5TB each. > And when I install MarkLogic, would all of the installed software go on the > EBS storage? And how would I ensure that? Or would there be a reason for > the /opt/MarkLogic to go to the "local" storage and the /var/opt/MarkLogic to > go to EBS storage? > > If I do install it this way, and copy 1/3 of my data to each node (ML host), > will the ML replication work? > > Thanks, > David > > -----Original Message----- > From: general-boun...@developer.marklogic.com > [mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley > Sent: Monday, August 08, 2011 2:44 PM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] EC2 AMI & ML > > David, I think the "local instance storage" limit is an AWS limit specific to > the instance type you might select: m1.xlarge has 850-GB, etc > (http://aws.amazon.com/ec2/instance-types/), not to either server license. > You can always connect more storage using EBS, though. EBS acts something > like NAS or SAN storage, and can persist across instance stop-start cycles > and can be moved from instance to instance within an availability zone, which > can be useful. Instance storage (aka ephemeral storage) is more like local > disk. Its performance can be more predictable, but it isn't as flexible as > EBS. If you want 5-TB of storage on one host, I think you'd have to use EBS > (http://aws.amazon.com/ebs/). > > -- Mike > > On 8 Aug 2011, at 11:28 , Steiner, David J. (LNG-DAY) wrote: > >> Just to make sure I understand correctly... >> >> If I want to use MarkLogic in EC2, then I can either pick the pay-per-use >> license or community license, both of which have restrictions on content >> size, i.e., where the most amount of storage I can get is: 1690 GB of local >> instance storage. Is this correct, or am I misunderstanding "local instance >> storage" to be the total storage available for data in the ML database? >> >> I'm assuming there is a limit, so alternatively, if I want to use the >> license that I already have, so that I can create a ML cluster that holds >> 5TBs of data, for instance, then I need to use an AMI that just contains an >> operating system and I would then have to install MarkLogic on it myself so >> that I can enter the license key. >> >> The documentation at: >> http://developer.marklogic.com/pubs/4.2/books/ec2.pdfonly covers using the >> established ML AMIs and not what to do if you want to use your existing >> license. >> >> A search on the EC2 AMI page on "RightScale CentOS Linux" which is what is >> mentioned in the guide produces 19 results. So, is there a recommended AMI >> (or AMIs) to pick? >> >> Thanks, >> David >> >> >> _______________________________________________ >> General mailing list >> General@developer.marklogic.com >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general