OK, so it sounds like your forest sizes actually are in MB, not GB (unless you 
made the same mistake I did :), so that does not sound so bad.  Given that you 
are using so many range indexes and that you only have a 6GB slice, you might 
be using up lots of memory.  Look to tools like top to help you figure that out.

I would try and concentrate on understanding your loading issues.   You need to 
figure out things like what the code is doing, how it is structured, and be a 
little concrete about how fast or slow it is.  For example, you might be 
locking more documents than you need to during load.  So try and get a test 
case that shows your issues, then you can start to whittle it down.  That is 
where having your own sandbox can help.

As to whether you will be able to see the errors if you turn off uncaught 
errors, the errors will still go back to the client, but not to the logs.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Bob O
Sent: Tuesday, May 21, 2013 2:22 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] ML Project Issues

Danny,

So here's what I gathered so far to answer your questions:

 *         I am confused what version you are on - is it 5.0-4.1 for this 
project 4.0? We are currently on MarkLogic V5.0-4.1
*         Is this production or development?  If it is production, you might 
consider contacting MarkLogic support. I am using a test bed. We currently have 
a test, dev, and a prod environment. We deploy VMs from prod.
*         Do you have a test cluster?  If not, I would make that a priority so 
you can try stuff easily. I believe it's what I'm using right now but it is 
shared by developers and engineers. Did you mean for me to build one on my own 
box?
*         600MB forests sound very large.  The rule-of-thumb for size is max 
200MB, so you are way off here.  But the important number is how many fragments 
are in the forests.  You should be able to get that number from the database 
status page (show forest info), or from the xdmp:forest-status function. We 
have 4 forests (I'll call them F1, F2, F3, F4). Their sizes are 362MB, 348Mb, 
340MB, and 345Mb respectively. Each forest has over 48,000 fragments with their 
"deleted fragments" ranging from 9500 to 11,100.  Each forest averages 25,000+ 
documents.
 *         How big are your VMs? (How much memory) The VMs have 6GB of memory 
with 2 CPUs and 5 Hard Disks (60GB for the 1st HD and 610GB for the remaining 4 
HDs with a total of 2,500GB of disk space (2.5TB)
*         How many Range indexes? I counted 50 Range Indexes and 6 Range 
Attribute Indexes.
*         How is your I/O rate on the system?  Ideally, it should be capable of 
roughly 20Mb/sec per forest. I was told 200Mb/sec...I haven't verified this. 
Where can I verify this number? Under the Database status page?
*         As far as the logs, you can turn off log uncaught errors on the App 
Server doing the loads (although you might need that info).  The more 
interesting question is why the loads are throwing errors. All the "log 
uncaught errors" are turned ON on ALL http servers. will I be able to still see 
and diagnose what errors I have if I turn this to "false"? I'm still looking 
into why the loads are failing.
*         How many nodes in this cluster? Without much info, my guess is that 
finding some decent disk is a high priority. That should give you a few things 
to scratch your head over. I believe there's only one node and it is not 
coupled. Is there a way to check how many nodes there are?

On Tue, May 21, 2013 at 11:21 AM, Danny Sokolsky 
<[email protected]<mailto:[email protected]>> wrote:
Hi Bob,

Here are a few questions and a few things I would focus on:


*         I am confused what version you are on - is it 5.0-4.1 for this 
project 4.0?

*         Is this production or development?  If it is production, you might 
consider contacting MarkLogic support.

*         Do you have a test cluster?  If not, I would make that a priority so 
you can try stuff easily.

*         600MB forests sound very large.  The rule-of-thumb for size is max 
200MB, so you are way off here.  But the important number is how many fragments 
are in the forests.  You should be able to get that number from the database 
status page (show forest info), or from the xdmp:forest-status function.

*         How big are your VMs? (how much memory)

*         How many Range indexes?

*         How is your I/O rate on the system?  Ideally, it should be capable of 
roughly 20Mb/sec per forest.

*         As far as the logs, you can turn off log uncaught errors on the App 
Server doing the loads (although you might need that info).  The more 
interesting question is why are the loads throwing errors.

*         How many nodes in this cluster?

Without much info, my guess is that finding some decent disk is a high priority.

That should give you a few things to scratch your head over.

-Danny

From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Bob O
Sent: Tuesday, May 21, 2013 8:39 AM
To: [email protected]<mailto:[email protected]>
Subject: [MarkLogic Dev General] ML Project Issues

Hello Everyone,

I am taking over a new project that I would consider large scale. I was hired 
as a ML DBA but I am really fairly new at MarkLogic. We were using ML4.0 and 
this project they are using ML v5.0-4.1 and they deploy the product on VMs.

They are running into a bunch of issues and I feel overwhelmed by it. I have 
seen some of it before but some of the issues are these:
1) logging issue: everytime their ingestions errors out, it logs off everything 
about it which amounts to about 2Mb everytime it happens. This happens quite 
often and they are getting tons of logs for a short period of time. Is there a 
way to minimize what the logs should spit out and cut down the extra 
unnecessaryinformation?

2) ingestion is slow: this could be anything that's causing the ingesstion to 
be so slow. Where should I look for the casue? I have contacted the SW 
Developer on the ingestion process and still waiting for his response. I am 
told that they are using an inhouse app called DDMS that I am not familiar with.

3) forest space: how do I check if there forest space is enough. They have 4 
forests and are around 600GB a piece. Is there a formula to properly figure out 
the space allocation for each forest and to plan for future use?

4) performance issues: they are experiencing some latency issues, CPU-IO 
scheduler, and they're fixing to buy NAS servers for their storage management.

I apologize for dropping all of these issues at once but I figure there are 
more brains out there than this one. I feel I hae taken a much bigger task and 
role thatn I could handle. I appreciate any assistance or direction anyone can 
give.

--BobO

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to