Ron, I believe that if you *can* fit your application onto one host without ridiculous hardware costs, you should. There are many actual and potential optimizations that can be done for local forests, but break down once the forests are on different hosts.
On Linux vs Windows: I think you'll have more control on linux. You can tune the OS I/O caching and swappiness, as well as dirty page management. It might also help to know that default OS I/O caching behaves quite a bit differently between the two. I've done tests where I've set the compressed-tree cache size as low as 1-MB on linux, without measurable performance impact. The OS buffer cache already does pretty much the same job, so it picks up the slack. As I understand it, this would not be true on Windows. Getting everything into RAM is an interesting problem. You can turn on "pre-map data" for the database, although that makes forest mounts quite I/O intensive. If possible use huge-pages so the OS can more easily manage all the RAM you're planning to use. Next I think you'll want to figure out how your forests break out into mappable files, ListData, and TreeData. You'll need available RAM for the mappable files, list-cache size equal to the ListData, compressed-tree cache size equal to TreeData, and expanded tree-cache equal to about 3x TreeData. Often ListData is something like 75% of forest size, TreeData is something like 20%, and mappable files make up the rest - but your database will likely be a little different, so measure it. But with those proportions a 10000-MiB database would call for something like 7500-MiB list cache, 2000-MiB compressed tree, 6000-MiB expanded-tree, plus 500-MiB for mapped files: total about 1.6x database size. Ideally you'd have *another* 1x so that the OS buffer cache could also cache all the files. You'll also want to account for OS, merges, and in-memory stands for those infrequent updates. I expect 32-GB would be more than enough for this example. As noted above you could skimp on compressed-tree if the OS buffer cache will do something similar, but that's a marginal savings. At some point in all this, you'll want to copy all the data into RAM. Slow I/O will impede that, especially if everything is done using random reads. One way to manage this is to use a ramdisk, but that has the disadvantage of requiring another large slice of memory: probably more than 2x to allow merge space, since you will have some updates. You could easily load up the OS buffer cache with something like 'find /mldata | xargs wc', as long as the buffer cache doesn't outsmart you. On linux where fadvise is available, using 'willneed' should help. This approach doesn't load the tree caches, but database reads should come from the buffer cache instead of driving more I/O. If you're testing the buffer cache in linux, it helps to know how to clear it: http://www.kernel.org/doc/Documentation/sysctl/vm.txt describes the 'drop_caches' sysctl for this. Once the OS buffer cache is populated you could just let the list and tree caches populate themselves from there, and that might be best. But by definition you have enough tree cache to hold the entire database, so you could load it up with something like 'xdmp:eval("collection()")[last()]'. The xdmp:eval ensures a non-streaming context, necessary because streaming bypasses the tree caches. I can't really think of a good way to pre-populate the list cache. You could cts:search with random terms, but that would be slow and there would be no guarantee of completeness. Another approach would be to build a word-query of every word in the word-lexicon, and then estimate it. That won't be complete either, since it won't exercise non-word terms. Probably it's best to ensure that the ListData is in OS buffer cache, and leave it at that. -- Mike On 16 Feb 2013, at 10:50 , Ron Hitchens <[email protected]> wrote: > > I'm trying to work out the best way to deploy a system > I'm designing into the cloud on AWS. We've been through > various permutations of AWS configurations and the main > thing we've learned is that there is a lot of uncertainty > and unpredictability around I/O performance in AWS. > > It's relatively expensive to provision guaranteed, high > performance I/O. We're testing an SSD solution at the > moment, but that is ephemeral (lost if the VM shuts down) > and very expensive. That's not a deal-killer for our > architecture, but makes it more complicated to deploy > and strains the ops budget. > > RAM, on the other hand, is relatively cheap to add to > and AWS instance. The total database size, at present, is > under 20GB and will grow relatively slowly. Provisioning > an AWS instance with ~64GB of RAM is fairly cost effective, > but the persistent EBS storage is sloooow. > > So, I have two questions: > > 1) Is there a best practice to tune MarkLogic where > RAM is plentiful (twice the size of the data or more) so > as to maximize caching of data. Ideally, we'd like the > whole database loaded into RAM. This system will run as > a read-only replica of a master database located elsewhere. > The goal is to maximize query performance, but updates of > relatively low frequency will be coming in from the master. > > The client is a Windows shop, but Linux is an approved > solution if need be. Are there exploitable differences at > the OS level that can improve filesystem caching? Are there > RAM disk or configuration tricks that would maximize RAM > usage without affecting update persistence? > > 2) Given #1 could lead to a mostly RAM-based configuration, > does it make sense to go with a single high-RAM, high-CPU > E+D-node that serves all requests with little or no actual I/O? > Or would it be an overall win to cluster E-nodes in front of > the big-RAM D-node to offload query evaluation and pay the > (10-gb) network latency penalty for inter-node comms? > > We do have the option of deploying multiple standalone > big-RAM E+D-nodes, each of which is a full replica of the data > from the master. This would basically give us the equivalent > of failover redundancy, but at the load balancer level rather > than within the cluster. This would also let us disperse > them across AZs and regions without worrying about split-brain > cluster issues. > > Thoughts? Recommendations? > > --- > Ron Hitchens {mailto:[email protected]} Ronsoft Technologies > +44 7879 358 212 (voice) http://www.ronsoft.com > +1 707 924 3878 (fax) Bit Twiddling At Its Finest > "No amount of belief establishes any fact." -Unknown > > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
