Mike, Here's my response to yours... On Tue, May 21, 2013 at 12:56 PM, Michael Blakeley <[email protected]>wrote:
> I suppose everyone knew that Danny meant 600-GB and 200-GB, below? All > good advice. If rebuilding your forests is not convenient, you might use > https://github.com/mblakele/task-rebalancer after adding new forests. > IS using the task rebalancer the solution to "cut the forests into evenly > distributed forests" ? > For disk space, the rule of thumb is 3x the fully-merged size. That is, if > you have a 200-GB forest it should be on a 600-GB filesystem. That's > necessary for situations where you have your base 200-GB documents, plus > nearly 200-GB of deleted fragments, and need to merge the whole thing. > It looks like I might have enough disk space since my forest are under > 400MB at this time. > Forest size also plays into the other rule of thumb Danny mentioned: > 20-MB/sec per forest. If your disks and CPUs can sustain that, then merging > 200-GB takes about 3-hr. If your forests can't sustain that merge rate, > then merges will probably fall behind ingestion and the forest may hit the > 64 stand limit. > How do I check how fast my forests are? I believe they told me 200Mb/sec > but I'm not sure yet. > It can be quite difficult to get adequate I/O performance out of a > virtualized environment. Moving to NAS may not help, and could easily hurt. > But configurations vary widely so there are no hard and fast answers. > They are moving to NAS. How can it hurt my database. > Check that the underlying OS is healthy. For example if the system is > paging heavily and using large amounts of swap, that will directly affect > database performance. > Where would I go to check how my system's paging and swapping factors are? Again as always, thank you very much for your comments. BobO. > > -- Mike > > On 21 May 2013, at 09:21 , Danny Sokolsky <[email protected]> > wrote: > > > Hi Bob, > > > > Here are a few questions and a few things I would focus on: > > > > · I am confused what version you are on – is it 5.0-4.1 for this > project 4.0? > > · Is this production or development? If it is production, you > might consider contacting MarkLogic support. > > · Do you have a test cluster? If not, I would make that a > priority so you can try stuff easily. > > · 600MB forests sound very large. The rule-of-thumb for size is > max 200MB, so you are way off here. But the important number is how many > fragments are in the forests. You should be able to get that number from > the database status page (show forest info), or from the xdmp:forest-status > function. > > · How big are your VMs? (how much memory) > > · How many Range indexes? > > · How is your I/O rate on the system? Ideally, it should be > capable of roughly 20Mb/sec per forest. > > · As far as the logs, you can turn off log uncaught errors on > the App Server doing the loads (although you might need that info). The > more interesting question is why are the loads throwing errors. > > · How many nodes in this cluster? > > > > Without much info, my guess is that finding some decent disk is a high > priority. > > > > That should give you a few things to scratch your head over. > > > > -Danny > > > > From: [email protected] [mailto: > [email protected]] On Behalf Of Bob O > > Sent: Tuesday, May 21, 2013 8:39 AM > > To: [email protected] > > Subject: [MarkLogic Dev General] ML Project Issues > > > > Hello Everyone, > > > > I am taking over a new project that I would consider large scale. I was > hired as a ML DBA but I am really fairly new at MarkLogic. We were using > ML4.0 and this project they are using ML v5.0-4.1 and they deploy the > product on VMs. > > > > They are running into a bunch of issues and I feel overwhelmed by it. I > have seen some of it before but some of the issues are these: > > 1) logging issue: everytime their ingestions errors out, it logs off > everything about it which amounts to about 2Mb everytime it happens. This > happens quite often and they are getting tons of logs for a short period of > time. Is there a way to minimize what the logs should spit out and cut down > the extra unnecessaryinformation? > > > > 2) ingestion is slow: this could be anything that's causing the > ingesstion to be so slow. Where should I look for the casue? I have > contacted the SW Developer on the ingestion process and still waiting for > his response. I am told that they are using an inhouse app called DDMS that > I am not familiar with. > > > > 3) forest space: how do I check if there forest space is enough. They > have 4 forests and are around 600GB a piece. Is there a formula to properly > figure out the space allocation for each forest and to plan for future use? > > > > 4) performance issues: they are experiencing some latency issues, CPU-IO > scheduler, and they're fixing to buy NAS servers for their storage > management. > > > > I apologize for dropping all of these issues at once but I figure there > are more brains out there than this one. I feel I hae taken a much bigger > task and role thatn I could handle. I appreciate any assistance or > direction anyone can give. > > > > --BobO > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general >
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
