Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud

Ron Hitchens Sat, 16 Feb 2013 13:47:13 -0800

   While I'm at it, another question.  Assuming it works well
to setup a single E+D big-RAM system that runs super fast
because it does virtually no I/O, where is the upper limit
to scalability as concurrent requests goes up?


   Such a system in AWS would probably have 8-16 cores.
Even with nearly frictionless data access, there will be
an upper bound to how many queries can be evaluated in a
given unit of time.  How do we determine the cross-over
point where it's better to add E-nodes to spread the CPU
load and let the big-RAM guy focus on data access?

   Would it make sense to go with n*E + 1*D from the start
where n can be dialed up and down easily?  Or go with one
monolithic E+D and just replicate it as the load goes up?
The usage profile is likely to have peaks and valleys
at fairly predictable times during the day/week.

On Feb 16, 2013, at 8:37 PM, Michael Blakeley <[email protected]> wrote:

> Ron, I believe that if you *can* fit your application onto one host without 
> ridiculous hardware costs, you should. There are many actual and potential 
> optimizations that can be done for local forests, but break down once the 
> forests are on different hosts.
> 
> On Linux vs Windows: I think you'll have more control on linux. You can tune 
> the OS I/O caching and swappiness, as well as dirty page management. It might 
> also help to know that default OS I/O caching behaves quite a bit differently 
> between the two. I've done tests where I've set the compressed-tree cache 
> size as low as 1-MB on linux, without measurable performance impact. The OS 
> buffer cache already does pretty much the same job, so it picks up the slack. 
> As I understand it, this would not be true on Windows.
> 
> Getting everything into RAM is an interesting problem. You can turn on 
> "pre-map data" for the database, although that makes forest mounts quite I/O 
> intensive. If possible use huge-pages so the OS can more easily manage all 
> the RAM you're planning to use. Next I think you'll want to figure out how 
> your forests break out into mappable files, ListData, and TreeData. You'll 
> need available RAM for the mappable files, list-cache size equal to the 
> ListData, compressed-tree cache size equal to TreeData, and expanded 
> tree-cache equal to about 3x TreeData. Often ListData is something like 75% 
> of forest size, TreeData is something like 20%, and mappable files make up 
> the rest - but your database will likely be a little different, so measure it.
> 
> But with those proportions a 10000-MiB database would call for something like 
> 7500-MiB list cache, 2000-MiB compressed tree, 6000-MiB expanded-tree, plus 
> 500-MiB for mapped files: total about 1.6x database size. Ideally you'd have 
> *another* 1x so that the OS buffer cache could also cache all the files. 
> You'll also want to account for OS, merges, and in-memory stands for those 
> infrequent updates. I expect 32-GB would be more than enough for this 
> example. As noted above you could skimp on compressed-tree if the OS buffer 
> cache will do something similar, but that's a marginal savings.
> 
> At some point in all this, you'll want to copy all the data into RAM. Slow 
> I/O will impede that, especially if everything is done using random reads. 
> One way to manage this is to use a ramdisk, but that has the disadvantage of 
> requiring another large slice of memory: probably more than 2x to allow merge 
> space, since you will have some updates. You could easily load up the OS 
> buffer cache with something like 'find /mldata | xargs wc', as long as the 
> buffer cache doesn't outsmart you. On linux where fadvise is available, using 
> 'willneed' should help. This approach doesn't load the tree caches, but 
> database reads should come from the buffer cache instead of driving more I/O.
> 
> If you're testing the buffer cache in linux, it helps to know how to clear 
> it: http://www.kernel.org/doc/Documentation/sysctl/vm.txt describes the 
> 'drop_caches' sysctl for this.
> 
> Once the OS buffer cache is populated you could just let the list and tree 
> caches populate themselves from there, and that might be best. But by 
> definition you have enough tree cache to hold the entire database, so you 
> could load it up with something like 'xdmp:eval("collection()")[last()]'. The 
> xdmp:eval ensures a non-streaming context, necessary because streaming 
> bypasses the tree caches.
> 
> I can't really think of a good way to pre-populate the list cache. You could 
> cts:search with random terms, but that would be slow and there would be no 
> guarantee of completeness. Another approach would be to build a word-query of 
> every word in the word-lexicon, and then estimate it. That won't be complete 
> either, since it won't exercise non-word terms. Probably it's best to ensure 
> that the ListData is in OS buffer cache, and leave it at that.
> 
> -- Mike
> 
> On 16 Feb 2013, at 10:50 , Ron Hitchens <[email protected]> wrote:
> 
>> 
>> I'm trying to work out the best way to deploy a system
>> I'm designing into the cloud on AWS.  We've been through
>> various permutations of AWS configurations and the main
>> thing we've learned is that there is a lot of uncertainty
>> and unpredictability around I/O performance in AWS.
>> 
>> It's relatively expensive to provision guaranteed, high
>> performance I/O.  We're testing an SSD solution at the
>> moment, but that is ephemeral (lost if the VM shuts down)
>> and very expensive.  That's not a deal-killer for our
>> architecture, but makes it more complicated to deploy
>> and strains the ops budget.
>> 
>> RAM, on the other hand, is relatively cheap to add to
>> and AWS instance.  The total database size, at present, is
>> under 20GB and will grow relatively slowly.  Provisioning
>> an AWS instance with ~64GB of RAM is fairly cost effective,
>> but the persistent EBS storage is sloooow.
>> 
>> So, I have two questions:
>> 
>> 1) Is there a best practice to tune MarkLogic where
>> RAM is plentiful (twice the size of the data or more) so
>> as to maximize caching of data.  Ideally, we'd like the
>> whole database loaded into RAM.  This system will run as
>> a read-only replica of a master database located elsewhere.
>> The goal is to maximize query performance, but updates of
>> relatively low frequency will be coming in from the master.
>> 
>> The client is a Windows shop, but Linux is an approved
>> solution if need be.  Are there exploitable differences at
>> the OS level that can improve filesystem caching?  Are there
>> RAM disk or configuration tricks that would maximize RAM
>> usage without affecting update persistence?
>> 
>> 2) Given #1 could lead to a mostly RAM-based configuration,
>> does it make sense to go with a single high-RAM, high-CPU
>> E+D-node that serves all requests with little or no actual I/O?
>> Or would it be an overall win to cluster E-nodes in front of
>> the big-RAM D-node to offload query evaluation and pay the
>> (10-gb) network latency penalty for inter-node comms?
>> 
>> We do have the option of deploying multiple standalone
>> big-RAM E+D-nodes, each of which is a full replica of the data
>> from the master.  This would basically give us the equivalent
>> of failover redundancy, but at the load balancer level rather
>> than within the cluster.  This would also let us disperse
>> them across AZs and regions without worrying about split-brain
>> cluster issues.
>> 
>> Thoughts?  Recommendations?
>> 
>> ---
>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>   +44 7879 358 212 (voice)          http://www.ronsoft.com
>>   +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>> "No amount of belief establishes any fact." -Unknown
>> 
>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

---
Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown




_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] RAM Rich, I/O Poor in the Cloud

Reply via email to