I've done some evaluations with smallish clusters. I'm cautious about planning 
a large cluster on AWS for a couple of reasons.

One of the basic rules of clustering is to avoid network latency within the 
cluster. But every AWS instance is a minimum of two hops from its closest 
neighbor, with the hypervisors acting as gateways. That closest neighbor will 
be co-resident on the same hypervisor, so for HA purposes you might even 
restrict yourself to instances at least 4-5 hops away. See 
http://blakeley.com/blogofile/2011/11/21/testing-aws-ec2-instances-for-co-residence/
 for more on co-residency and how to avoid it.

So I'd try to minimize the cluster size by maximizing the size of the data host 
instances. This has the happy side-effect of making it more likely that you 
have the hypervisor all to yourself, which should remove the hypervisor 
co-residency problem. If possible I would use this strategy to run single-host 
clusters exclusively.

I'm not sure that you care about HA, but it is one potential driver for a 
multi-instance cluster. Some folks grow their clusters almost entirely for HA 
purposes, using replica forests within a cluster. As alluded to above, though, 
it requires care to avoid co-residency problems: instances mounting forest 
replicas on the same hypervisor aren't going to provide much redundancy. EBS is 
more of a black box, so it isn't clear that a replica forest will help in the 
event of EBS failures.

So it may be better to treat AWS as the local HA solution, ignoring local 
failover. Instead worry only about what happens when AWS fails entirely - as it 
seems to do about once annually. Such events can be treated as "disasters" and 
handled via database replication or flex-rep to a foreign cluster in another 
data center.

Getting back to instance tuning, stock EBS can certainly be a bottleneck. With 
update-heavy workloads I've seen the CPU outstrip EBS enough to generate 
XDMP-TOOMANYSTANDS errors. Even with RAID-0 and many EBS volumes it's tricky to 
get consistent merge rates above 10-MB/sec. As Wayne and David mentioned the 
new EBS options are supposed to help, but I don't have direct experience with 
that yet. I can tell you that EBS performance shifts around as demand changes: 
Christmas shopping season seems to have a pronounced effect. And no matter how 
many EBS volumes you mount or RAID together, the traffic all goes through the 
same network interface. So having 10-Gbit interfaces is supposed to help.

The OS itself can also be a bottleneck. I suspect Amazon has done some tuning 
of linux for their http://aws.amazon.com/amazon-linux-ami/ offering, but my 
testing suggests that this only makes much of a difference when multiple 
demanding instances share a hypervisor. Still, it might be worth comparing the 
Amazon linux AMIs with Windows AMIs. You might find a compelling reason to 
switch.

This might be another reason to use a very large instance type to ensure 
exclusive use of the hypervisor. You may also want to check the CPU type when 
bringing up new instances, and reject any that have older CPU models. Some 
zones still have quite a few older 4-digit Opteron CPUs, and you will notice 
the difference in performance if you get one of these.

-- Mike

On 8 Jan 2013, at 05:46 , Ron Hitchens <[email protected]> wrote:

> 
>   We tried the EBS Optimized option and that hasn't made
> much of a difference either.  I suppose RAIDing across EBS
> is a way to go, but I'm afraid that would fall outside the
> comfort zone of the people administering this stuff.
> 
>   I'll have them look into the Provisioned IOPs thing.  What
> I really want is high-performance local disk to meet the
> performance targets we have.
> 
>   Thanks for the help.
> 
>   Is anybody out there actually running large-ish production
> MarkLogic clusters in the cloud?
> 
> On Jan 8, 2013, at 12:35 PM, David Lee wrote:
> 
>> Almost certainly as Wayne suggests your bottleneck is IO.
>> 
>> The default storage is EBS which is a type of network SAN.
>> Some instance types have "EBS Optimized" which you should try.
>> This gives a dedicated network channel to EBS.
>> Then add RAID across the EBS for extra fun.
>> 
>> Even better as Wayne suggests is instances with "Provisioned IOPS"
>> or some of the truly amazing DB oriented instances with tons of local 
>> storage.
>> 
>> Also you could consider using Ephemerial Storage, however as the name 
>> suggests it
>> will not last beyond the instance life.
>> 
>> 
>> -----------------------------------------------------------------------------
>> David Lee
>> Lead Engineer
>> MarkLogic Corporation
>> [email protected]
>> Phone: +1 812-482-5224
>> Cell:  +1 812-630-7622
>> www.marklogic.com
>> 
>> 
>> 
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of Wayne Feick
>> Sent: Tuesday, January 08, 2013 7:20 AM
>> To: General Mark Logic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] MarkLogic in AWS Cloud
>> 
>> I don't have a lot of experience with it, but EBS volumes have limited 
>> bandwidth. Some people have had success striping across multiple EBS volumes 
>> from within Linux instances. You could also look at the more recent 
>> guaranteed IOPs capability Amazon now offers.
>> 
>> Wayne
>> 
>> Ron Hitchens <[email protected]> wrote:
>> 
>> 
>>  Has anyone had any experience configuring and running non-trivial
>> MarkLogic clusters in the cloud?  Specifically Amazon EC2 VMs?
>> 
>>  I've got a test cluster of three nodes setup in AWS and am trying
>> to figure out the best configuration for it.  The system seems to be
>> quite slow at some things, but reasonably fast at others.  Bumping
>> the VM up to bigger instances (more ram, more cores) doesn't seem to
>> have a significant impact on speed or throughput.
>> 
>>  I suspect I/O bandwidth may be the culprit, but that's just a
>> hunch.  Does anyone have any experience with tuning EC2 VMs?
>> 
>>  The test environment I'm working with now is three m2.xlarge
>> instances (32gb RAM, 4 cores, "high" network speed).  The OS is
>> Windows (groan, I don't have a choice there).  Production cluster(s)
>> are likely to be similar, but probably six nodes or so.
>> 
>>  Any advice//war stories/dire warnings greatly appreciated.
>> 
>>  Thanks.
>> 
>> ---
>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>    +44 7879 358 212 (voice)          http://www.ronsoft.com
>>    +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>> "No amount of belief establishes any fact." -Unknown
>> 
>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> ---
> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>     +44 7879 358 212 (voice)          http://www.ronsoft.com
>     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> "No amount of belief establishes any fact." -Unknown
> 
> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to