David, Charles is right - turning off merges is not a great idea generally, and is only useful in rare circumstances. The merging is internal to the forests rather than an operation across forests - it copies the "stands" within a forest into larger stands and deletes the old stands over time. You should not normally have to be aware of merges or stands other than when monitoring your system because it is an internal housekeeping operation.
Your expanded tree cache errors will be separate from merges because the expanded tree cache is allocated to a set size, and the merge process doesn't use any of it up. The expanded tree cache must be able to hold all the documents from the database used by all the concurrent, active requests on a host. Perhaps turning off merges somehow caused more concurrent read/query requests to run at the same time, which can exhaust your cache. Note that normally, the ETC is a real "cache" because it holds documents loaded by recent, completed requests for some time. But your error is because it also needs to hold everything used during the active requests. Can you post the queries you are running that are causing the ExpandedTreeCacheFull error? Often a query causes that because it is accessing too much of the database and can be fixed by changing the query approach. Yours, Damon -- Damon Feldman Sr. Principal Consultant, MarkLogic From: [email protected] [mailto:[email protected]] On Behalf Of Steiner, David J. (LNG-DAY) Sent: Tuesday, October 30, 2012 8:51 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Too Many Stands So, you're saying that turning off merging isn't an effective technique to use during a load, if I understand what you're saying. However, merges use memory and leaving them "on" interferes with other memory based operations like transformations. I continue to get expanded tree cache exceptions when I leave merging on while trying to load and I don't when I turn it off. It would appear that's because ML evenly distributes the documents across the forests and thus, when it's time to merge, they all merge, which leaves no memory for the collections and transformations that are going on. So, instead of being able to load hundreds of millions of records however the data comes to me in files, you're saying I need to figure out how to pre-batch my data to ensure that I don't have memory issues? That's just shifting the burden from one place to another and it still doesn't help me load data. If my load dies because memory gets exhausted, how's that any better than dying because I've run out of stands? Let me try my question another way then: In general, how many fragments does it take to make up a stand? I'm guessing it's a fragment thing that's making the newer stands as data is being loaded because it can't be a document size thing since my documents are 3 elements and at most 100 - 200 characters. David From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Charles Greer Sent: Monday, October 29, 2012 1:21 PM To: MarkLogic Developer Discussion Cc: Steiner, David J. (LNG-DAY) Subject: Re: [MarkLogic Dev General] Too Many Stands My understanding is that you should really not turn off merges -- merging has improved a lot in later versions, and while it can be a performance hit if the server starts merging during a big load, MarkLogic does a better job now with scheduling and throttling merges than in the past. Moreover, merging improves the performance of the database after the fact, a lot. I think you should probably look to other means for helping with ingest times -- batch inserts (multiple docs per transaction) are probably the biggest improvement you can get (but of course this is highly dependent on your document structures) Charles On 10/29/2012 09:53 AM, Steiner, David J. (LNG-DAY) wrote: Hello, Thought that I'd seen in documentation where one could "speed up" loading by turning off merges, so I did. Seemed to work pretty good until I got this error: XDMP-TOOMANYSTANDS: xdmp:eval("import module namespace infodev = "http://marklogic.com/app...", (fn:QName("", "document"), fn:doc("[uri].xml"), fn:QName("", "path"), ...), <options xmlns="xdmp:eval"><database>1385720675613291619</database></options>) -- Too many stands So, apparently a periodic merge is required to even proceed with loading. Is there documentation on how to know when a merge would be needed? For instance, I have X docs to load into Y forests so at most I can load X/Z docs, then I'll need to manually merge before more loading. Thanks, David -- Charles Greer Senior Engineer MarkLogic Corporation [email protected]<mailto:[email protected]> Phone: +1 707 408 3277 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
