Re: [MarkLogic Dev General] Too Many Stands

Damon Feldman Tue, 30 Oct 2012 06:24:15 -0700

David,

Charles is right - turning off merges is not a great idea generally, and is 
only useful in rare circumstances. The merging is internal to the forests 
rather than an operation across forests - it copies the "stands" within a 
forest into larger stands and deletes the old stands over time. You should not 
normally have to be aware of merges or stands other than when monitoring your 
system because it is an internal housekeeping operation.


Your expanded tree cache errors will be separate from merges because the 
expanded tree cache is allocated to a set size, and the merge process doesn't 
use any of it up. The expanded tree cache must be able to hold all the 
documents from the database used by all the concurrent, active requests on a 
host. Perhaps turning off merges somehow caused more concurrent read/query 
requests to run at the same time, which can exhaust your cache. Note that 
normally, the ETC is a real "cache" because it holds documents loaded by 
recent, completed requests for some time. But your error is because it also 
needs to hold everything used during the active requests.

Can you post the queries you are running that are causing the 
ExpandedTreeCacheFull error? Often a query causes that because it is accessing 
too much of the database and can be fixed by changing the query approach.

Yours,
Damon

--
Damon Feldman
Sr. Principal Consultant, MarkLogic

From: [email protected] 
[mailto:[email protected]] On Behalf Of Steiner, David J. 
(LNG-DAY)
Sent: Tuesday, October 30, 2012 8:51 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Too Many Stands

So, you're saying that turning off merging isn't an effective technique to use 
during a load, if I understand what you're saying.
However, merges use memory and leaving them "on" interferes with other memory 
based operations like transformations.  I continue to get expanded tree cache 
exceptions when I leave merging on while trying to load and I don't when I turn 
it off.
It would appear that's because ML evenly distributes the documents across the 
forests and thus, when it's time to merge, they all merge, which leaves no 
memory for the collections and transformations that are going on.
So, instead of being able to load hundreds of millions of records however the 
data comes to me in files, you're saying I need to figure out how to pre-batch 
my data to ensure that I don't have memory issues?  That's just shifting the 
burden from one place to another and it still doesn't help me load data.  If my 
load dies because memory gets exhausted, how's that any better than dying 
because I've run out of stands?

Let me try my question another way then: In general, how many fragments does it 
take to make up a stand?  I'm guessing it's a fragment thing that's making the 
newer stands as data is being loaded because it can't be a document size thing 
since my documents are 3 elements and at most 100 - 200 characters.

David

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Charles Greer
Sent: Monday, October 29, 2012 1:21 PM
To: MarkLogic Developer Discussion
Cc: Steiner, David J. (LNG-DAY)
Subject: Re: [MarkLogic Dev General] Too Many Stands

My understanding is that you should really not turn off merges -- merging has 
improved a lot in later versions, and while it can be a performance hit if the 
server starts merging during a big load, MarkLogic does a better job now with 
scheduling and throttling merges than in the past.  Moreover, merging improves 
the performance of the database after the fact, a lot.

I think you should probably look to other means for helping with ingest times 
-- batch inserts (multiple docs per transaction) are probably the biggest 
improvement you can get (but of course this is highly dependent on your 
document structures)

Charles



On 10/29/2012 09:53 AM, Steiner, David J. (LNG-DAY) wrote:
Hello,

Thought that I'd seen in documentation where one could "speed up" loading by 
turning off merges, so I did.  Seemed to work pretty good until I got this 
error:

XDMP-TOOMANYSTANDS: xdmp:eval("import module namespace infodev = 
&quot;http://marklogic.com/app...";, (fn:QName("", "document"), 
fn:doc("[uri].xml"), fn:QName("", "path"), ...), <options 
xmlns="xdmp:eval"><database>1385720675613291619</database></options>) -- Too 
many stands

So, apparently a periodic merge is required to even proceed with loading.  Is 
there documentation on how to know when a merge would be needed?  For instance, 
I have X docs to load into Y forests so at most I can load X/Z docs, then I'll 
need to manually merge before more loading.

Thanks,
David



--

Charles Greer

Senior Engineer

MarkLogic Corporation

[email protected]<mailto:[email protected]>

Phone: +1 707 408 3277

www.marklogic.com<http://www.marklogic.com>



This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Too Many Stands

Reply via email to