Thanks for listening to what I meant and not what I said Mike (I was only off 
by a factor of 1024....).  Yes, I meant GB.

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Tuesday, May 21, 2013 10:57 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] ML Project Issues

I suppose everyone knew that Danny meant 600-GB and 200-GB, below? All good 
advice. If rebuilding your forests is not convenient, you might use 
https://github.com/mblakele/task-rebalancer after adding new forests.

For disk space, the rule of thumb is 3x the fully-merged size. That is, if you 
have a 200-GB forest it should be on a 600-GB filesystem. That's necessary for 
situations where you have your base 200-GB documents, plus nearly 200-GB of 
deleted fragments, and need to merge the whole thing.

Forest size also plays into the other rule of thumb Danny mentioned: 20-MB/sec 
per forest. If your disks and CPUs can sustain that, then merging 200-GB takes 
about 3-hr. If your forests can't sustain that merge rate, then merges will 
probably fall behind ingestion and the forest may hit the 64 stand limit.

It can be quite difficult to get adequate I/O performance out of a virtualized 
environment. Moving to NAS may not help, and could easily hurt. But 
configurations vary widely so there are no hard and fast answers.

Check that the underlying OS is healthy. For example if the system is paging 
heavily and using large amounts of swap, that will directly affect database 
performance.

-- Mike

On 21 May 2013, at 09:21 , Danny Sokolsky <[email protected]> wrote:

> Hi Bob,
>  
> Here are a few questions and a few things I would focus on:
>  
> *         I am confused what version you are on - is it 5.0-4.1 for this 
> project 4.0?
> *         Is this production or development?  If it is production, you might 
> consider contacting MarkLogic support.
> *         Do you have a test cluster?  If not, I would make that a priority 
> so you can try stuff easily.
> *         600MB forests sound very large.  The rule-of-thumb for size is max 
> 200MB, so you are way off here.  But the important number is how many 
> fragments are in the forests.  You should be able to get that number from the 
> database status page (show forest info), or from the xdmp:forest-status 
> function. 
> *         How big are your VMs? (how much memory)
> *         How many Range indexes?
> *         How is your I/O rate on the system?  Ideally, it should be capable 
> of roughly 20Mb/sec per forest.
> *         As far as the logs, you can turn off log uncaught errors on the App 
> Server doing the loads (although you might need that info).  The more 
> interesting question is why are the loads throwing errors.
> *         How many nodes in this cluster?
>  
> Without much info, my guess is that finding some decent disk is a high 
> priority.
>  
> That should give you a few things to scratch your head over.
>  
> -Danny
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Bob O
> Sent: Tuesday, May 21, 2013 8:39 AM
> To: [email protected]
> Subject: [MarkLogic Dev General] ML Project Issues
>  
> Hello Everyone,
>  
> I am taking over a new project that I would consider large scale. I was hired 
> as a ML DBA but I am really fairly new at MarkLogic. We were using ML4.0 and 
> this project they are using ML v5.0-4.1 and they deploy the product on VMs.
>  
> They are running into a bunch of issues and I feel overwhelmed by it. I have 
> seen some of it before but some of the issues are these:
> 1) logging issue: everytime their ingestions errors out, it logs off 
> everything about it which amounts to about 2Mb everytime it happens. This 
> happens quite often and they are getting tons of logs for a short period of 
> time. Is there a way to minimize what the logs should spit out and cut down 
> the extra unnecessaryinformation?
>  
> 2) ingestion is slow: this could be anything that's causing the ingesstion to 
> be so slow. Where should I look for the casue? I have contacted the SW 
> Developer on the ingestion process and still waiting for his response. I am 
> told that they are using an inhouse app called DDMS that I am not familiar 
> with.
>  
> 3) forest space: how do I check if there forest space is enough. They have 4 
> forests and are around 600GB a piece. Is there a formula to properly figure 
> out the space allocation for each forest and to plan for future use?
>  
> 4) performance issues: they are experiencing some latency issues, CPU-IO 
> scheduler, and they're fixing to buy NAS servers for their storage management.
>  
> I apologize for dropping all of these issues at once but I figure there are 
> more brains out there than this one. I feel I hae taken a much bigger task 
> and role thatn I could handle. I appreciate any assistance or direction 
> anyone can give.
>  
> --BobO
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to