Hello,

I am receiving an exception reading "Expanded tree cache full on host" 
(XDMP-EXPNTREECACHEFULL). Thanks to previous posts and ML documentation, I 
employed paging (75,000 / page) and temporarily tripled the expanded tree cache 
size (from 2048 to 6144).

I find it odd that doubling the cache size only allowed the script to get a 
little further, disproportionately so. It makes me wonder if the cache is 
retaining results from previous lines, of the same script, and if there is a 
way to flush the cache midstream.

I am using ML 3.2-5 and attempting to pull stats on files and users via an 
on-demand script:

1. Create a list of URIs representing all files: <uri ext="{$ext}">{$uri}</uri>

2. For 9 different types of files we're reporting on (images, audio/video, XML, 
etc.), iterate through the file URI list to create a sub-list. These lists are 
created using the file extension portion of the URI.

3. After each sub-list is created, calculate the total file size using fn:sum() 
and a metadata value from the file's properties. This is where I encountered my 
first instance of the EXPNTREECACHEFULL exception. Paging kept the cache size 
less than the original setting.

4. Create a distinct list of file extensions via fn:distinct-values().

5. Lastly, create a distinct list of users that modified one or more files. 
This is the second instance of the EXPNTREECACHEFULL exception. I added paging, 
same as in step no. 3. The exception was then thrown in the 3rd set. I doubled 
the expanded tree cache size. The exception was then thrown in the 4th set.

After tripling the cache size, it made it yet I had only scripted 6 of 14 
pages. Given I can only increase the cache size so much, I'm curious what my 
alternatives are for these large jobs. I'll try a couple more tests after 
hitting the send button, namely a) all 14 pages using the tripled cache size, 
b) reverse the processing order or split the script into two and c) reduce the 
page size. The purpose of test "b." is to identify if I'm just doing too much 
in one script or this part of the script is asking too much.

Below is one of the exceptions and associated snippet.

com.marklogic.xcc.exceptions.XQueryException: XDMP-EXPNTREECACHEFULL: for $uri 
as item()* in $uris-all[$level-2 + 1 to min(($cnt-all, $level-3))] -- Expanded 
tree cache full on host ...
let $users-3 :=
   if ($cnt-all > $level-2) then
     fn:distinct-values(
       for $uri in $uris-all[($level-2 + 1) to fn:min(($cnt-all,$level-3))]
         return xdmp:document-properties($uri)/prop:properties/meta:r_modifier)
   else ()
Note the above use of fn:distinct-values() is present under the believe it 
would lighten the load on a subsequent call to fn:distinct-values(($users-1, 
$users-2, $users-3, ...)). This is probably unnecessary as ($users-n * the 
number of pages) will be significantly smaller than $uris-all.

Thank you in advance for your time and thoughts.

-Brent
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to