Hi Mike, It is hard to answer a question like this in generalities. But here are a few random ideas, probably most of which you have tried:
Have you tried profiling the query? That can point you to hot spots fairly quickly. I'm not totally sure what you mean by "the breakdown reported by the optimizer". Do you mean in xdmp:plan? xdmp:query-trace? Something else? What version of MarkLogic are you running (xdmp:version() )? Are you using range index lookups to find the links (with a cts:query param, for example)? When you say you are doing node replaces, do you mean you are writing each document multiple times? That can get expensive, and it is often faster to create a new version of the document in memory and then write the document once. There is a library to do in-memory node-replaces too if you don't feel like writing that yourself. -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mike Sokolov Sent: Wednesday, August 08, 2012 10:41 AM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] my link resolver is slow I've written some code to resolve links in a batch process; the links can point to a number of different element/@id in any document, and we are trying to record the destination document uri with the link so it can be rendered quickly at run-time, and missing links won't be rendered at all. Basically the process is: for each of some batch of documents, for each of its links, search for the matching document, and replace the link with an element having a uri attribute pointing to that document. Overall, this process is running much slower than I had expected. I've been examining the query using the profiler, and after doing some optimization of the searches, I find something a bit strange. The breakdown reported by the optimizer doesn't seem to account for the total time. It looks to me as if all the searches are completing fairly quickly, based on logging statements that indicate all the documents in the batch have been "processed", and then the query just seems to hang for a while before returning. It seems to spend about 90% of the total time in this second stage. My assumption is this time is spent performing the updates, committing, indexing, writing a journal file, or something like that. My question is: should I expect this to be reflected in the optimizer? And is there some way I can figure out why it is taking so long, and what I can do about it? Maybe inserting a node would be faster than replacing? I've tried a tree-walk rather than lots of node-replaces, but that actually seemed quite a bit slower. Thanks for any suggestions! -- Michael Sokolov Engineering Director www.ifactory.com _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
