A short update: After refactoring and running with an expanded tree cache size of 6144 MB, the full version of this script completed without exception. I'm also in contact with support and another individual who may know better ways to accomplish this task.
Special thanks to this forum, which consistently comes through for one another. -Brent ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hartwig, Brent (CL Tech Sv) Sent: Tuesday, December 02, 2008 12:54 PM To: General Mark Logic Developer Discussion Subject: [MarkLogic Dev General] Contending with the expanded tree cache Hello, I am receiving an exception reading "Expanded tree cache full on host" (XDMP-EXPNTREECACHEFULL). Thanks to previous posts and ML documentation, I employed paging (75,000 / page) and temporarily tripled the expanded tree cache size (from 2048 to 6144). I find it odd that doubling the cache size only allowed the script to get a little further, disproportionately so. It makes me wonder if the cache is retaining results from previous lines, of the same script, and if there is a way to flush the cache midstream. I am using ML 3.2-5 and attempting to pull stats on files and users via an on-demand script: 1. Create a list of URIs representing all files: <uri ext="{$ext}">{$uri}</uri> 2. For 9 different types of files we're reporting on (images, audio/video, XML, etc.), iterate through the file URI list to create a sub-list. These lists are created using the file extension portion of the URI. 3. After each sub-list is created, calculate the total file size using fn:sum() and a metadata value from the file's properties. This is where I encountered my first instance of the EXPNTREECACHEFULL exception. Paging kept the cache size less than the original setting. 4. Create a distinct list of file extensions via fn:distinct-values(). 5. Lastly, create a distinct list of users that modified one or more files. This is the second instance of the EXPNTREECACHEFULL exception. I added paging, same as in step no. 3. The exception was then thrown in the 3rd set. I doubled the expanded tree cache size. The exception was then thrown in the 4th set. After tripling the cache size, it made it yet I had only scripted 6 of 14 pages. Given I can only increase the cache size so much, I'm curious what my alternatives are for these large jobs. I'll try a couple more tests after hitting the send button, namely a) all 14 pages using the tripled cache size, b) reverse the processing order or split the script into two and c) reduce the page size. The purpose of test "b." is to identify if I'm just doing too much in one script or this part of the script is asking too much. Below is one of the exceptions and associated snippet. com.marklogic.xcc.exceptions.XQueryException: XDMP-EXPNTREECACHEFULL: for $uri as item()* in $uris-all[$level-2 + 1 to min(($cnt-all, $level-3))] -- Expanded tree cache full on host ... let $users-3 := if ($cnt-all > $level-2) then fn:distinct-values( for $uri in $uris-all[($level-2 + 1) to fn:min(($cnt-all,$level-3))] return xdmp:document-properties($uri)/prop:properties/meta:r_modifier) else () Note the above use of fn:distinct-values() is present under the believe it would lighten the load on a subsequent call to fn:distinct-values(($users-1, $users-2, $users-3, ...)). This is probably unnecessary as ($users-n * the number of pages) will be significantly smaller than $uris-all. Thank you in advance for your time and thoughts. -Brent
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
