Re: [MarkLogic Dev General] Too Many Stands

Geert Josten Tue, 30 Oct 2012 07:20:16 -0700

Hi David,



Damon might have meant it the other way around. When merges are active,
that can slow down processes in general, making requests take longer. That
increases the chance that there will be too many of the problematic queries
running in parallel, thus exhausting the cache.



I agree with Damon that it sounds like you have an inefficient query that
is causing the trouble. Can you isolate it, and profile it or show us some
bits of it?



Kind regards,

Geert



*Van:* [email protected] [mailto:
[email protected]] *Namens *Steiner, David J.
(LNG-DAY)
*Verzonden:* dinsdag 30 oktober 2012 14:28
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Too Many Stands



Damon,



Maybe I wasn’t clear.  Merges off – I get “too many stands” and no caching
errors.  Merges on – I get “expanded tree cache” errors.



Looks like I’ll give up on being able to load the data without manual
intervention.



Thanks anyway,

David



*From:* [email protected] [
mailto:[email protected]<[email protected]>]
*On Behalf Of *Damon Feldman
*Sent:* Tuesday, October 30, 2012 9:24 AM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] Too Many Stands



David,



Charles is right – turning off merges is not a great idea generally, and is
only useful in rare circumstances. The merging is internal to the forests
rather than an operation across forests – it copies the “stands” within a
forest into larger stands and deletes the old stands over time. You should
not normally have to be aware of merges or stands other than when
monitoring your system because it is an internal housekeeping operation.



Your expanded tree cache errors will be separate from merges because the
expanded tree cache is allocated to a set size, and the merge process
doesn’t use any of it up. The expanded tree cache must be able to hold all
the documents from the database used by all the concurrent, active requests
on a host. Perhaps turning off merges somehow caused more concurrent
read/query requests to run at the same time, which can exhaust your cache.
Note that normally, the ETC is a real “cache” because it holds documents
loaded by recent, completed requests for some time. But your error is
because it also needs to hold everything used during the active requests.



Can you post the queries you are running that are causing the
ExpandedTreeCacheFull error? Often a query causes that because it is
accessing too much of the database and can be fixed by changing the query
approach.



Yours,

Damon



--

Damon Feldman

Sr. Principal Consultant, MarkLogic



*From:* [email protected] [
mailto:[email protected]<[email protected]>]
*On Behalf Of *Steiner, David J. (LNG-DAY)
*Sent:* Tuesday, October 30, 2012 8:51 AM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] Too Many Stands



So, you’re saying that turning off merging isn’t an effective technique to
use during a load, if I understand what you’re saying.

However, merges use memory and leaving them “on” interferes with other
memory based operations like transformations.  I continue to get expanded
tree cache exceptions when I leave merging on while trying to load and I
don’t when I turn it off.

It would appear that’s because ML evenly distributes the documents across
the forests and thus, when it’s time to merge, they all merge, which leaves
no memory for the collections and transformations that are going on.

So, instead of being able to load hundreds of millions of records however
the data comes to me in files, you’re saying I need to figure out how to
pre-batch my data to ensure that I don’t have memory issues?  That’s just
shifting the burden from one place to another and it still doesn’t help me
load data.  If my load dies because memory gets exhausted, how’s that any
better than dying because I’ve run out of stands?



Let me try my question another way then: In general, how many fragments
does it take to make up a stand?  I’m guessing it’s a fragment thing that’s
making the newer stands as data is being loaded because it can’t be a
document size thing since my documents are 3 elements and at most 100 – 200
characters.



David



*From:* [email protected] [
mailto:[email protected]<[email protected]>]
*On Behalf Of *Charles Greer
*Sent:* Monday, October 29, 2012 1:21 PM
*To:* MarkLogic Developer Discussion
*Cc:* Steiner, David J. (LNG-DAY)
*Subject:* Re: [MarkLogic Dev General] Too Many Stands



My understanding is that you should really not turn off merges -- merging
has improved a lot in later versions, and while it can be a performance hit
if the server starts merging during a big load, MarkLogic does a better job
now with scheduling and throttling merges than in the past.  Moreover,
merging improves the performance of the database after the fact, a lot.

I think you should probably look to other means for helping with ingest
times -- batch inserts (multiple docs per transaction) are probably the
biggest improvement you can get (but of course this is highly dependent on
your document structures)

Charles



On 10/29/2012 09:53 AM, Steiner, David J. (LNG-DAY) wrote:

Hello,



Thought that I’d seen in documentation where one could “speed up” loading
by turning off merges, so I did.  Seemed to work pretty good until I got
this error:



XDMP-TOOMANYSTANDS: xdmp:eval("import module namespace infodev = &quot;
http://marklogic.com/app...";, (fn:QName("", "document"),
fn:doc("[uri].xml"), fn:QName("", "path"), ...), <options
xmlns="xdmp:eval"><database>1385720675613291619</database></options>) --
Too many stands



So, apparently a periodic merge is required to even proceed with loading.
Is there documentation on how to know when a merge would be needed?  For
instance, I have X docs to load into Y forests so at most I can load X/Z
docs, then I’ll need to manually merge before more loading.



Thanks,
David



-- 

Charles Greer

Senior Engineer

MarkLogic Corporation

[email protected]

Phone: +1 707 408 3277

www.marklogic.com



This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom
it is addressed. Any review, disclosure, copying, distribution, or use
of this e-mail communication by others is strictly prohibited. If you
are not the intended recipient, please notify us immediately by
returning this message to the sender and delete all copies. Thank you
for your cooperation.

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Too Many Stands

Reply via email to