Thanks a bunch, Danny! When I posted these questions, I was feeling overwhelmed...I feel a little better! I really want to learn MarkLogic because I can see it's going to help the government, specially the military, tremendously.
thanks again for y'all guidance! Bob O. On Tue, May 21, 2013 at 4:41 PM, Danny Sokolsky < [email protected]> wrote: > OK, so it sounds like your forest sizes actually are in MB, not GB > (unless you made the same mistake I did :), so that does not sound so bad. > Given that you are using so many range indexes and that you only have a 6GB > slice, you might be using up lots of memory. Look to tools like top to > help you figure that out.**** > > ** ** > > I would try and concentrate on understanding your loading issues. You > need to figure out things like what the code is doing, how it is > structured, and be a little concrete about how fast or slow it is. For > example, you might be locking more documents than you need to during load. > So try and get a test case that shows your issues, then you can start to > whittle it down. That is where having your own sandbox can help.**** > > ** ** > > As to whether you will be able to see the errors if you turn off uncaught > errors, the errors will still go back to the client, but not to the logs. > **** > > ** ** > > -Danny**** > > ** ** > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Bob O > *Sent:* Tuesday, May 21, 2013 2:22 PM > *To:* MarkLogic Developer Discussion > *Subject:* Re: [MarkLogic Dev General] ML Project Issues**** > > ** ** > > Danny,**** > > **** > > So here's what I gathered so far to answer your questions:**** > > **** > > · I am confused what version you are on – is it 5.0-4.1 for this > project 4.0? We are currently on MarkLogic V5.0-4.1 **** > > · Is this production or development? If it is production, you > might consider contacting MarkLogic support. I am using a test bed. We > currently have a test, dev, and a prod environment. We deploy VMs from prod. > **** > > · Do you have a test cluster? If not, I would make that a > priority so you can try stuff easily. I believe it's what I'm using right > now but it is shared by developers and engineers. Did you mean for me to > build one on my own box?**** > > · 600MB forests sound very large. The rule-of-thumb for size is > max 200MB, so you are way off here. But the important number is how many > fragments are in the forests. You should be able to get that number from > the database status page (show forest info), or from the xdmp:forest-status > function. We have 4 forests (I'll call them F1, F2, F3, F4). Their sizes > are 362MB, 348Mb, 340MB, and 345Mb respectively. Each forest has over > 48,000 fragments with their "deleted fragments" ranging from 9500 to > 11,100. Each forest averages 25,000+ documents.**** > > · How big are your VMs? (How much memory) The VMs have 6GB of > memory with 2 CPUs and 5 Hard Disks (60GB for the 1st HD and 610GB for the > remaining 4 HDs with a total of 2,500GB of disk space (2.5TB)**** > > · How many Range indexes? I counted 50 Range Indexes and 6 Range > Attribute Indexes.**** > > · How is your I/O rate on the system? Ideally, it should be > capable of roughly 20Mb/sec per forest. I was told 200Mb/sec...I haven't > verified this. Where can I verify this number? Under the Database status > page?**** > > · As far as the logs, you can turn off log uncaught errors on the > App Server doing the loads (although you might need that info). The more > interesting question is why the loads are throwing errors. All the "log > uncaught errors" are turned ON on ALL http servers. will I be able to still > see and diagnose what errors I have if I turn this to "false"? I'm still > looking into why the loads are failing.**** > > · How many nodes in this cluster? Without much info, my guess is > that finding some decent disk is a high priority. That should give you a > few things to scratch your head over. I believe there's only one node and > it is not coupled. Is there a way to check how many nodes there are?**** > > ** ** > > On Tue, May 21, 2013 at 11:21 AM, Danny Sokolsky < > [email protected]> wrote:**** > > Hi Bob,**** > > **** > > Here are a few questions and a few things I would focus on:**** > > **** > > · I am confused what version you are on – is it 5.0-4.1 for this > project 4.0?**** > > · Is this production or development? If it is production, you > might consider contacting MarkLogic support.**** > > · Do you have a test cluster? If not, I would make that a > priority so you can try stuff easily.**** > > · 600MB forests sound very large. The rule-of-thumb for size is > max 200MB, so you are way off here. But the important number is how many > fragments are in the forests. You should be able to get that number from > the database status page (show forest info), or from the xdmp:forest-status > function. **** > > · How big are your VMs? (how much memory)**** > > · How many Range indexes?**** > > · How is your I/O rate on the system? Ideally, it should be > capable of roughly 20Mb/sec per forest.**** > > · As far as the logs, you can turn off log uncaught errors on the > App Server doing the loads (although you might need that info). The more > interesting question is why are the loads throwing errors.**** > > · How many nodes in this cluster?**** > > **** > > Without much info, my guess is that finding some decent disk is a high > priority.**** > > **** > > That should give you a few things to scratch your head over.**** > > **** > > -Danny**** > > **** > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Bob O > *Sent:* Tuesday, May 21, 2013 8:39 AM > *To:* [email protected] > *Subject:* [MarkLogic Dev General] ML Project Issues**** > > **** > > Hello Everyone,**** > > **** > > I am taking over a new project that I would consider large scale. I was > hired as a ML DBA but I am really fairly new at MarkLogic. We were using > ML4.0 and this project they are using ML v5.0-4.1 and they deploy the > product on VMs.**** > > **** > > They are running into a bunch of issues and I feel overwhelmed by it. I > have seen some of it before but some of the issues are these:**** > > 1) logging issue: everytime their ingestions errors out, it logs off > everything about it which amounts to about 2Mb everytime it happens. This > happens quite often and they are getting tons of logs for a short period of > time. Is there a way to minimize what the logs should spit out and cut down > the extra unnecessaryinformation?**** > > **** > > 2) ingestion is slow: this could be anything that's causing the ingesstion > to be so slow. Where should I look for the casue? I have contacted the SW > Developer on the ingestion process and still waiting for his response. I am > told that they are using an inhouse app called DDMS that I am not familiar > with.**** > > **** > > 3) forest space: how do I check if there forest space is enough. They have > 4 forests and are around 600GB a piece. Is there a formula to properly > figure out the space allocation for each forest and to plan for future use? > **** > > **** > > 4) performance issues: they are experiencing some latency issues, CPU-IO > scheduler, and they're fixing to buy NAS servers for their storage > management.**** > > **** > > I apologize for dropping all of these issues at once but I figure there > are more brains out there than this one. I feel I hae taken a much bigger > task and role thatn I could handle. I appreciate any assistance or > direction anyone can give. **** > > **** > > --BobO**** > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general**** > > ** ** > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
