It's pretty difficult to estimate E vs D without deep knowledge of the queries. But you could probably set up a testbed and measure it directly.
In the absence of data, I would plan on using an elastic load balancer (ELB) from day one. That way you can experiment with E-hosts without breaking anything, and also add replicas without breaking anything. Plus it's convenient to have that reliable, external DNS entry. -- Mike On 16 Feb 2013, at 13:47 , Ron Hitchens <[email protected]> wrote: > > While I'm at it, another question. Assuming it works well > to setup a single E+D big-RAM system that runs super fast > because it does virtually no I/O, where is the upper limit > to scalability as concurrent requests goes up? > > Such a system in AWS would probably have 8-16 cores. > Even with nearly frictionless data access, there will be > an upper bound to how many queries can be evaluated in a > given unit of time. How do we determine the cross-over > point where it's better to add E-nodes to spread the CPU > load and let the big-RAM guy focus on data access? > > Would it make sense to go with n*E + 1*D from the start > where n can be dialed up and down easily? Or go with one > monolithic E+D and just replicate it as the load goes up? > The usage profile is likely to have peaks and valleys > at fairly predictable times during the day/week. > > On Feb 16, 2013, at 8:37 PM, Michael Blakeley <[email protected]> wrote: > >> Ron, I believe that if you *can* fit your application onto one host without >> ridiculous hardware costs, you should. There are many actual and potential >> optimizations that can be done for local forests, but break down once the >> forests are on different hosts. >> >> On Linux vs Windows: I think you'll have more control on linux. You can tune >> the OS I/O caching and swappiness, as well as dirty page management. It >> might also help to know that default OS I/O caching behaves quite a bit >> differently between the two. I've done tests where I've set the >> compressed-tree cache size as low as 1-MB on linux, without measurable >> performance impact. The OS buffer cache already does pretty much the same >> job, so it picks up the slack. As I understand it, this would not be true on >> Windows. >> >> Getting everything into RAM is an interesting problem. You can turn on >> "pre-map data" for the database, although that makes forest mounts quite I/O >> intensive. If possible use huge-pages so the OS can more easily manage all >> the RAM you're planning to use. Next I think you'll want to figure out how >> your forests break out into mappable files, ListData, and TreeData. You'll >> need available RAM for the mappable files, list-cache size equal to the >> ListData, compressed-tree cache size equal to TreeData, and expanded >> tree-cache equal to about 3x TreeData. Often ListData is something like 75% >> of forest size, TreeData is something like 20%, and mappable files make up >> the rest - but your database will likely be a little different, so measure >> it. >> >> But with those proportions a 10000-MiB database would call for something >> like 7500-MiB list cache, 2000-MiB compressed tree, 6000-MiB expanded-tree, >> plus 500-MiB for mapped files: total about 1.6x database size. Ideally you'd >> have *another* 1x so that the OS buffer cache could also cache all the >> files. You'll also want to account for OS, merges, and in-memory stands for >> those infrequent updates. I expect 32-GB would be more than enough for this >> example. As noted above you could skimp on compressed-tree if the OS buffer >> cache will do something similar, but that's a marginal savings. >> >> At some point in all this, you'll want to copy all the data into RAM. Slow >> I/O will impede that, especially if everything is done using random reads. >> One way to manage this is to use a ramdisk, but that has the disadvantage of >> requiring another large slice of memory: probably more than 2x to allow >> merge space, since you will have some updates. You could easily load up the >> OS buffer cache with something like 'find /mldata | xargs wc', as long as >> the buffer cache doesn't outsmart you. On linux where fadvise is available, >> using 'willneed' should help. This approach doesn't load the tree caches, >> but database reads should come from the buffer cache instead of driving more >> I/O. >> >> If you're testing the buffer cache in linux, it helps to know how to clear >> it: http://www.kernel.org/doc/Documentation/sysctl/vm.txt describes the >> 'drop_caches' sysctl for this. >> >> Once the OS buffer cache is populated you could just let the list and tree >> caches populate themselves from there, and that might be best. But by >> definition you have enough tree cache to hold the entire database, so you >> could load it up with something like 'xdmp:eval("collection()")[last()]'. >> The xdmp:eval ensures a non-streaming context, necessary because streaming >> bypasses the tree caches. >> >> I can't really think of a good way to pre-populate the list cache. You could >> cts:search with random terms, but that would be slow and there would be no >> guarantee of completeness. Another approach would be to build a word-query >> of every word in the word-lexicon, and then estimate it. That won't be >> complete either, since it won't exercise non-word terms. Probably it's best >> to ensure that the ListData is in OS buffer cache, and leave it at that. >> >> -- Mike >> >> On 16 Feb 2013, at 10:50 , Ron Hitchens <[email protected]> wrote: >> >>> >>> I'm trying to work out the best way to deploy a system >>> I'm designing into the cloud on AWS. We've been through >>> various permutations of AWS configurations and the main >>> thing we've learned is that there is a lot of uncertainty >>> and unpredictability around I/O performance in AWS. >>> >>> It's relatively expensive to provision guaranteed, high >>> performance I/O. We're testing an SSD solution at the >>> moment, but that is ephemeral (lost if the VM shuts down) >>> and very expensive. That's not a deal-killer for our >>> architecture, but makes it more complicated to deploy >>> and strains the ops budget. >>> >>> RAM, on the other hand, is relatively cheap to add to >>> and AWS instance. The total database size, at present, is >>> under 20GB and will grow relatively slowly. Provisioning >>> an AWS instance with ~64GB of RAM is fairly cost effective, >>> but the persistent EBS storage is sloooow. >>> >>> So, I have two questions: >>> >>> 1) Is there a best practice to tune MarkLogic where >>> RAM is plentiful (twice the size of the data or more) so >>> as to maximize caching of data. Ideally, we'd like the >>> whole database loaded into RAM. This system will run as >>> a read-only replica of a master database located elsewhere. >>> The goal is to maximize query performance, but updates of >>> relatively low frequency will be coming in from the master. >>> >>> The client is a Windows shop, but Linux is an approved >>> solution if need be. Are there exploitable differences at >>> the OS level that can improve filesystem caching? Are there >>> RAM disk or configuration tricks that would maximize RAM >>> usage without affecting update persistence? >>> >>> 2) Given #1 could lead to a mostly RAM-based configuration, >>> does it make sense to go with a single high-RAM, high-CPU >>> E+D-node that serves all requests with little or no actual I/O? >>> Or would it be an overall win to cluster E-nodes in front of >>> the big-RAM D-node to offload query evaluation and pay the >>> (10-gb) network latency penalty for inter-node comms? >>> >>> We do have the option of deploying multiple standalone >>> big-RAM E+D-nodes, each of which is a full replica of the data >>> from the master. This would basically give us the equivalent >>> of failover redundancy, but at the load balancer level rather >>> than within the cluster. This would also let us disperse >>> them across AZs and regions without worrying about split-brain >>> cluster issues. >>> >>> Thoughts? Recommendations? >>> >>> --- >>> Ron Hitchens {mailto:[email protected]} Ronsoft Technologies >>> +44 7879 358 212 (voice) http://www.ronsoft.com >>> +1 707 924 3878 (fax) Bit Twiddling At Its Finest >>> "No amount of belief establishes any fact." -Unknown >>> >>> >>> >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >>> >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > --- > Ron Hitchens {mailto:[email protected]} Ronsoft Technologies > +44 7879 358 212 (voice) http://www.ronsoft.com > +1 707 924 3878 (fax) Bit Twiddling At Its Finest > "No amount of belief establishes any fact." -Unknown > > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
