On 21 May 2013, at 14:33 , Bob O <[email protected]> wrote: > >> Here's my response to yours... > > On Tue, May 21, 2013 at 12:56 PM, Michael Blakeley <[email protected]> wrote: > I suppose everyone knew that Danny meant 600-GB and 200-GB, below? All good > advice. If rebuilding your forests is not convenient, you might use > https://github.com/mblakele/task-rebalancer after adding new forests. >> IS using the task rebalancer the solution to "cut the forests into evenly >> distributed forests" ?
Yes, but it sounds like you don't need to worry about that right now. > For disk space, the rule of thumb is 3x the fully-merged size. That is, if > you have a 200-GB forest it should be on a 600-GB filesystem. That's > necessary for situations where you have your base 200-GB documents, plus > nearly 200-GB of deleted fragments, and need to merge the whole thing. >> It looks like I might have enough disk space since my forest are under 400MB >> at this time. Agreed. > Forest size also plays into the other rule of thumb Danny mentioned: > 20-MB/sec per forest. If your disks and CPUs can sustain that, then merging > 200-GB takes about 3-hr. If your forests can't sustain that merge rate, then > merges will probably fall behind ingestion and the forest may hit the 64 > stand limit. >> How do I check how fast my forests are? I believe they told me 200Mb/sec but >> I'm not sure yet. You can force a merge of all forests in the database. Use the admin UI: Databases > database-name > Merge. Then check the database status to see the merges in real time, and the ErrorLog.txt for 'Merged...' messages. However with only 400-MB you may not have a good test. On the bright side, with 400-MB forests I think disk I/O is less likely to be a problem. > It can be quite difficult to get adequate I/O performance out of a > virtualized environment. Moving to NAS may not help, and could easily hurt. > But configurations vary widely so there are no hard and fast answers. >> They are moving to NAS. How can it hurt my database. That's a fairly complex topic. You may not need to worry about it until you have hundreds of GB of forest data, but here are some quick thoughts. It's possible to design local storage so that it is slower or faster than a given NAS. It's possible to design NAS so that it is slower or faster than given local storage. But in most cases it is cheaper to build out similar levels of performance from local disk than from NAS (or SAN). I prefer to use local storage. This also avoids the strong probability that the I/O demands of the cluster will swamp the network link to the NAS, or the NAS controller. Of course the local controller or disks can still be a bottleneck. But it's usually easier to fix local storage than it is to fix a SAN or NAS. > Check that the underlying OS is healthy. For example if the system is paging > heavily and using large amounts of swap, that will directly affect database > performance. >> Where would I go to check how my system's paging and swapping factors are? That depends on the OS. With linux or solaris I would start with 'top'. With windows I believe the task manager has some virtual memory data, or there is always the perfmon application. You also mentioned this: >> We have 4 forests (I'll call them F1, F2, F3, F4). Their sizes are 362MB, >> 348Mb, 340MB, and 345Mb respectively. Each forest has over 48,000 fragments >> with their "deleted fragments" ranging from 9500 to 11,100. Each forest >> averages 25,000+ documents. Those sound like quite large documents, which could be the root of the problem. Accounting for the deleted fragments, the average fragment size is something like 6-MB. If half of those are property fragments then the average document size is probably twice that. Unless these are binary nodes, that probably means the storage model needs work. You probably can't go into the details of your content or queries on this discussion list, so I would follow Danny's advice and contact MarkLogic support. -- Mike _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
