Yes amazon has made EBS MUCH better in the last few months. If you create an EC2 instance with "EBS Optimized" and create a "Provisioned IO" EBS Volume you are *guaranteed* that IO 99.9% of the time. This is night and day compared to the normal EBS volumes which are sporadic. ---> FROM AWS "When attached to EBS-Optimized instances, Provisioned IOPS volumes are designed to deliver within 10% of the provisioned IOPS performance 99.9% of the time" ---
To get 20MB/sec you need to provision a 500 IOPS EBS volume (minimum 50 GB). Atleast this is what I measure. This will cost you $50/month . Try it ! You will be surprised at the improvement in the new EBS and EC2 instances - don't make the mistake of using your experience of even 1 year ago ... its a whole new world now . Provisioned IOPS is a total game changer. Add the that the super-duper memory EC2 images and you got some serious crack. But even a m1.large or m1.xlarge is pretty amazing when you give it provisioned IOPS. ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation [email protected] Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Sunday, February 17, 2013 3:10 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud My rule of thumb is to try for 20-MB/sec merges. That means each forest needs to maintain 20-MB/sec reads and 20-MB/sec writes, at the same time. Storage benchmarks often test one or the other. The problem I had with either multi-EBS approach was that one or two volumes always seemed to be much slower than the others. So everything tended to wait for the slowest volume. But perhaps Amazon has made them more consistent recently? -- Mike On 17 Feb 2013, at 05:33 , David Lee <[email protected]> wrote: > FYI you can RAID EBS Provisioned IO to achieve higher bandwidth, > or put one forest on each and ML will automatically write and read in > parallel .... > which may be an easier to manage solution (as if you RAID EBS you have to > make sure they all move together). > > > ----------------------------------------------------------------------------- > David Lee > Lead Engineer > MarkLogic Corporation > [email protected] > Phone: +1 812-482-5224 > Cell: +1 812-630-7622 > www.marklogic.com > > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of David Lee > Sent: Sunday, February 17, 2013 8:24 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud > > That page is talking about RDS ... (MySql or Oracle). You cannot buy these > for EBS on EC2. > Not what you can get with EBS. > If you have access to an AWS account try it ... you dont have to commit all > the way. > You are given a choice between 100 and 2000 IOPS for EBS. And you must > provision a volume at least 1/10 GB the IOPS. > (eg. to get a 1000 IOPS EBS you need a 100GB EBS.) > The higher IOPS they quote for MySQL or Oracle are local storage which is > equivalent to the new High IO EC2 instances, > NOT the EBS instances. But still its not apples and apples because they dont > quote block size ... e.g. so you cant calculate > IO data rates. I ran experiments and found that the EBS single volume rates > pegged at 40MB/Sec. > IF this is good or bad I cant say, but it is as good as I get on a fast local > disk. and its sustained. > > But yes its complicated !!!!!!! How ML will actually perform on the various > configurations is tough to predict, IMHO. > You may decide you need to try out a few configurations until you are > satisfied. > My simple 1 host node n a xlarge instance with 1000 IOPS went "really fast" > ... hows that for scientific :) > > > > > > > ----------------------------------------------------------------------------- > David Lee > Lead Engineer > MarkLogic Corporation > [email protected] > Phone: +1 812-482-5224 > Cell: +1 812-630-7622 > www.marklogic.com > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Ron Hitchens > Sent: Sunday, February 17, 2013 4:36 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud > > > The info I have is that IOPS can be provisioned > from 1,000 to 10,000 and Google dragged back this > page which gives the same numbers: > > http://aws.amazon.com/about-aws/whats-new/2012/09/25/announcing-provisioned-iops-for-amazon-rds/ > > Someone else has done the initial costing on various > configurations and the moaning centers around the cost > of paying for database-server-level I/O bandwidth. > > But I agree with you: lots of RAM and careful tuning > may get it done without the cost and complexity. Hence > this discussion. > > On Feb 17, 2013, at 12:14 AM, David Lee <[email protected]> wrote: > >> I understand its complicated (IT IS!) >> But I think you might have a off-by-10x in your calculations. >> AWS Provisioned EBS IOPS come at MAX 2000 (not 10,000) and I have not been >> able to achieve faster rates over 1000 . >> Certainly this is nothing compared to SSDs but it's an order of magnitude >> over regular ESBs and reasonably fast, >> 40mBytes/Sec sustained at about $100/TB/Month/Volume. Couple this with >> lots of Ram and after a short period your system shouldnt be hitting the >> disk often (I hope ...). >> So IMHO I wouldnt discard this offhand as "too expensive" ... >> With 1000 IOPS and a big memory machine and some warming up I think you can >> achive very good performance. >> But yes its complicated ... note that IOPS dont hint at bandwidth ... they >> seem to run 16k block IO reguardless of the filesystem blocksize. >> >> Of course noone can beat local SSD's for speed ... but ... THEY are >> expensive. >> >> >> >> >> >> ----------------------------------------------------------------------------- >> David Lee >> Lead Engineer >> MarkLogic Corporation >> [email protected] >> Phone: +1 812-482-5224 >> Cell: +1 812-630-7622 >> www.marklogic.com >> >> >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Ron Hitchens >> Sent: Saturday, February 16, 2013 5:19 PM >> To: MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud >> >> >> Right, but the cost needs to be multiplied by >> all the nodes in all the clusters. And our ops people >> want to put multiple EBS volumes on each node. >> >> But the source of the debate comes from comparing I/O >> speed for PIOPs (1,000 to about 10,000) against SSDs (up to >> around 150,000) and in-RAM speed (even faster, obviously). >> >> Our needs are focused less on I/O throughput and more >> on fast data access. In that light, 1,000 IOPs seems >> pretty slow. That is why I'm looking at alternative ways >> of structuring the system to minimize I/O (so we can go >> with the lowest PIOP tier) but still get super-fast data >> access without paying for higher PIOP tiers (which shoot >> up in cost quickly) or SSDs which will require additional >> deployment complexity to get the data loaded onto them (and >> are not available in all AWS zones). >> >> Also, it's not a simple comparison of cost of RAM vs >> cost of disk (or more accurately I/O speed), it's also >> a complexity management issue. Figuring out what means >> what in AWS and how the various options interact and making >> multiple instances talk to each other properly (both in >> terms of AWS configuration and corporate governance on >> our end) quickly becomes a tangle of dependencies. This is >> why I'm trying to determine if just paying for a big RAM >> instance and a minimal guaranteed level of I/O performance >> will be a better cost/benefit ratio once everything is >> factored in. >> >> On Feb 16, 2013, at 8:57 PM, David Lee <[email protected]> wrote: >> >>> This is interesting ! Ram cheaper then Disk ? (I know its complicated ... >>> but its an interesting evolution in the market). >>> >>> So 'how expensive' is provisioned IO ? I have found in my few tests that >>> using 1000 IOPS I can get a sustained througput of 40MB/sec read and right >>> forever. If this is really a RAM backed mostly read-only system then your >>> IO operations will be few. >>> >>> Costs for 1000 IOPS >>> http://aws.amazon.com/ebs/ >>> >>> $0.125 per GB-month of provisioned storage >>> $0.10 per provisioned IOPS-month >>> >>> So per month for say a 100GB (minimum size for 1000 IOPS) >>> $12.25 / month storage >>> $100 / month for IOPS ( 1000 IOPS ) >>> >>> >>> Is that beyond your budget ? >>> >>> >>> >>> ----------------------------------------------------------------------------- >>> David Lee >>> Lead Engineer >>> MarkLogic Corporation >>> [email protected] >>> Phone: +1 812-482-5224 >>> Cell: +1 812-630-7622 >>> www.marklogic.com >>> >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Ron Hitchens >>> Sent: Saturday, February 16, 2013 1:50 PM >>> To: MarkLogic Developer Discussion >>> Subject: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud >>> >>> >>> I'm trying to work out the best way to deploy a system >>> I'm designing into the cloud on AWS. We've been through >>> various permutations of AWS configurations and the main >>> thing we've learned is that there is a lot of uncertainty >>> and unpredictability around I/O performance in AWS. >>> >>> It's relatively expensive to provision guaranteed, high >>> performance I/O. We're testing an SSD solution at the >>> moment, but that is ephemeral (lost if the VM shuts down) >>> and very expensive. That's not a deal-killer for our >>> architecture, but makes it more complicated to deploy >>> and strains the ops budget. >>> >>> RAM, on the other hand, is relatively cheap to add to >>> and AWS instance. The total database size, at present, is >>> under 20GB and will grow relatively slowly. Provisioning >>> an AWS instance with ~64GB of RAM is fairly cost effective, >>> but the persistent EBS storage is sloooow. >>> >>> So, I have two questions: >>> >>> 1) Is there a best practice to tune MarkLogic where >>> RAM is plentiful (twice the size of the data or more) so >>> as to maximize caching of data. Ideally, we'd like the >>> whole database loaded into RAM. This system will run as >>> a read-only replica of a master database located elsewhere. >>> The goal is to maximize query performance, but updates of >>> relatively low frequency will be coming in from the master. >>> >>> The client is a Windows shop, but Linux is an approved >>> solution if need be. Are there exploitable differences at >>> the OS level that can improve filesystem caching? Are there >>> RAM disk or configuration tricks that would maximize RAM >>> usage without affecting update persistence? >>> >>> 2) Given #1 could lead to a mostly RAM-based configuration, >>> does it make sense to go with a single high-RAM, high-CPU >>> E+D-node that serves all requests with little or no actual I/O? >>> Or would it be an overall win to cluster E-nodes in front of >>> the big-RAM D-node to offload query evaluation and pay the >>> (10-gb) network latency penalty for inter-node comms? >>> >>> We do have the option of deploying multiple standalone >>> big-RAM E+D-nodes, each of which is a full replica of the data >>> from the master. This would basically give us the equivalent >>> of failover redundancy, but at the load balancer level rather >>> than within the cluster. This would also let us disperse >>> them across AZs and regions without worrying about split-brain >>> cluster issues. >>> >>> Thoughts? Recommendations? >>> >>> --- >>> Ron Hitchens {mailto:[email protected]} Ronsoft Technologies >>> +44 7879 358 212 (voice) http://www.ronsoft.com >>> +1 707 924 3878 (fax) Bit Twiddling At Its Finest >>> "No amount of belief establishes any fact." -Unknown >>> >>> >>> >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >> >> --- >> Ron Hitchens {mailto:[email protected]} Ronsoft Technologies >> +44 7879 358 212 (voice) http://www.ronsoft.com >> +1 707 924 3878 (fax) Bit Twiddling At Its Finest >> "No amount of belief establishes any fact." -Unknown >> >> >> >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > --- > Ron Hitchens {mailto:[email protected]} Ronsoft Technologies > +44 7879 358 212 (voice) http://www.ronsoft.com > +1 707 924 3878 (fax) Bit Twiddling At Its Finest > "No amount of belief establishes any fact." -Unknown > > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
