This is interesting ! Ram cheaper then Disk ? (I know its complicated ... but its an interesting evolution in the market).
So 'how expensive' is provisioned IO ? I have found in my few tests that using 1000 IOPS I can get a sustained througput of 40MB/sec read and right forever. If this is really a RAM backed mostly read-only system then your IO operations will be few. Costs for 1000 IOPS http://aws.amazon.com/ebs/ $0.125 per GB-month of provisioned storage $0.10 per provisioned IOPS-month So per month for say a 100GB (minimum size for 1000 IOPS) $12.25 / month storage $100 / month for IOPS ( 1000 IOPS ) Is that beyond your budget ? ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation [email protected] Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Ron Hitchens Sent: Saturday, February 16, 2013 1:50 PM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud I'm trying to work out the best way to deploy a system I'm designing into the cloud on AWS. We've been through various permutations of AWS configurations and the main thing we've learned is that there is a lot of uncertainty and unpredictability around I/O performance in AWS. It's relatively expensive to provision guaranteed, high performance I/O. We're testing an SSD solution at the moment, but that is ephemeral (lost if the VM shuts down) and very expensive. That's not a deal-killer for our architecture, but makes it more complicated to deploy and strains the ops budget. RAM, on the other hand, is relatively cheap to add to and AWS instance. The total database size, at present, is under 20GB and will grow relatively slowly. Provisioning an AWS instance with ~64GB of RAM is fairly cost effective, but the persistent EBS storage is sloooow. So, I have two questions: 1) Is there a best practice to tune MarkLogic where RAM is plentiful (twice the size of the data or more) so as to maximize caching of data. Ideally, we'd like the whole database loaded into RAM. This system will run as a read-only replica of a master database located elsewhere. The goal is to maximize query performance, but updates of relatively low frequency will be coming in from the master. The client is a Windows shop, but Linux is an approved solution if need be. Are there exploitable differences at the OS level that can improve filesystem caching? Are there RAM disk or configuration tricks that would maximize RAM usage without affecting update persistence? 2) Given #1 could lead to a mostly RAM-based configuration, does it make sense to go with a single high-RAM, high-CPU E+D-node that serves all requests with little or no actual I/O? Or would it be an overall win to cluster E-nodes in front of the big-RAM D-node to offload query evaluation and pay the (10-gb) network latency penalty for inter-node comms? We do have the option of deploying multiple standalone big-RAM E+D-nodes, each of which is a full replica of the data from the master. This would basically give us the equivalent of failover redundancy, but at the load balancer level rather than within the cluster. This would also let us disperse them across AZs and regions without worrying about split-brain cluster issues. Thoughts? Recommendations? --- Ron Hitchens {mailto:[email protected]} Ronsoft Technologies +44 7879 358 212 (voice) http://www.ronsoft.com +1 707 924 3878 (fax) Bit Twiddling At Its Finest "No amount of belief establishes any fact." -Unknown _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
