Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud

Ron Hitchens Sat, 16 Feb 2013 14:18:49 -0800

   Right, but the cost needs to be multiplied by
all the nodes in all the clusters.  And our ops people
want to put multiple EBS volumes on each node.


   But the source of the debate comes from comparing I/O
speed for PIOPs (1,000 to about 10,000) against SSDs (up to
around 150,000) and in-RAM speed (even faster, obviously).

   Our needs are focused less on I/O throughput and more
on fast data access.  In that light, 1,000 IOPs seems
pretty slow.  That is why I'm looking at alternative ways
of structuring the system to minimize I/O (so we can go
with the lowest PIOP tier) but still get super-fast data
access without paying for higher PIOP tiers (which shoot
up in cost quickly) or SSDs which will require additional
deployment complexity to get the data loaded onto them (and
are not available in all AWS zones).

   Also, it's not a simple comparison of cost of RAM vs
cost of disk (or more accurately I/O speed), it's also
a complexity management issue.  Figuring out what means
what in AWS and how the various options interact and making
multiple instances talk to each other properly (both in
terms of AWS configuration and corporate governance on
our end) quickly becomes a tangle of dependencies.  This is
why I'm trying to determine if just paying for a big RAM
instance and a minimal guaranteed level of I/O performance
will be a better cost/benefit ratio once everything is
factored in.

On Feb 16, 2013, at 8:57 PM, David Lee <[email protected]> wrote:

> This is interesting ! Ram cheaper then Disk ?  (I know its complicated ... 
> but its an interesting evolution in the market).
> 
> So 'how expensive' is provisioned IO ? I have found in my few tests that 
> using 1000 IOPS I can get a sustained througput of 40MB/sec read and right 
> forever.  If this is really a RAM backed mostly read-only system then your IO 
> operations will be few.
> 
> Costs for 1000 IOPS 
> http://aws.amazon.com/ebs/
> 
> $0.125 per GB-month of provisioned storage
> $0.10 per provisioned IOPS-month
> 
> So per month for say a 100GB (minimum size for 1000 IOPS) 
> $12.25 / month storage
> $100  / month for IOPS   ( 1000 IOPS  )
> 
> 
> Is that beyond your budget ?
> 
> 
> 
> -----------------------------------------------------------------------------
> David Lee
> Lead Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 812-482-5224
> Cell:  +1 812-630-7622
> www.marklogic.com
> 
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Ron Hitchens
> Sent: Saturday, February 16, 2013 1:50 PM
> To: MarkLogic Developer Discussion
> Subject: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud
> 
> 
>  I'm trying to work out the best way to deploy a system
> I'm designing into the cloud on AWS.  We've been through
> various permutations of AWS configurations and the main
> thing we've learned is that there is a lot of uncertainty
> and unpredictability around I/O performance in AWS.
> 
>  It's relatively expensive to provision guaranteed, high
> performance I/O.  We're testing an SSD solution at the
> moment, but that is ephemeral (lost if the VM shuts down)
> and very expensive.  That's not a deal-killer for our
> architecture, but makes it more complicated to deploy
> and strains the ops budget.
> 
>  RAM, on the other hand, is relatively cheap to add to
> and AWS instance.  The total database size, at present, is
> under 20GB and will grow relatively slowly.  Provisioning
> an AWS instance with ~64GB of RAM is fairly cost effective,
> but the persistent EBS storage is sloooow.
> 
>  So, I have two questions:
> 
>  1) Is there a best practice to tune MarkLogic where
> RAM is plentiful (twice the size of the data or more) so
> as to maximize caching of data.  Ideally, we'd like the
> whole database loaded into RAM.  This system will run as
> a read-only replica of a master database located elsewhere.
> The goal is to maximize query performance, but updates of
> relatively low frequency will be coming in from the master.
> 
>  The client is a Windows shop, but Linux is an approved
> solution if need be.  Are there exploitable differences at
> the OS level that can improve filesystem caching?  Are there
> RAM disk or configuration tricks that would maximize RAM
> usage without affecting update persistence?
> 
>  2) Given #1 could lead to a mostly RAM-based configuration,
> does it make sense to go with a single high-RAM, high-CPU
> E+D-node that serves all requests with little or no actual I/O?
> Or would it be an overall win to cluster E-nodes in front of
> the big-RAM D-node to offload query evaluation and pay the
> (10-gb) network latency penalty for inter-node comms?
> 
>  We do have the option of deploying multiple standalone
> big-RAM E+D-nodes, each of which is a full replica of the data
> from the master.  This would basically give us the equivalent
> of failover redundancy, but at the load balancer level rather
> than within the cluster.  This would also let us disperse
> them across AZs and regions without worrying about split-brain
> cluster issues.
> 
>  Thoughts?  Recommendations?
> 
> ---
> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>    +44 7879 358 212 (voice)          http://www.ronsoft.com
>    +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> "No amount of belief establishes any fact." -Unknown
> 
> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

---
Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown




_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud

Reply via email to