Michael-

Breaking up the updates into smaller batches is definitely possible,  but....

Are you saying that holding the locks is causing the expanded tree cache to 
fill up?
"I like to use batches of 500 or 1000"-   My natural inclination is to try and 
draw a parallel to SQL processing - I often have to do this type of update 
against Oracle DB's and my normal practice would be to write a similar PL/SQL 
routine which would commit every 500-1000 rows or so to release locks - 
especially if doing this type of update activity on an OLTP type system.  

Is there a way to interject "commit" within the xquery which would have the 
same effect (hopefully then allowing the processing to continue to completion 
without the need to break it up into multiple calls)? What would that look like 
in code?  I've always heard references to the fact that ML has transactional 
commit/rollback capability, but in practicality, my observations have been that 
the updates seem to commit immediately.

Thanks in advance-

Art


-----Original Message-----
From: Michael Blakeley [mailto:[email protected]] 
Sent: Friday, September 24, 2010 12:59 AM
To: General Mark Logic Developer Discussion
Cc: Zegarek, Arthur
Subject: Re: [MarkLogic Dev General] XDMP-EXPNTREECACHEFULL on ML 3.2

Arthur, I'm sorry to see that you wrote that much code while looking for 
a solution. Sometimes it's helpful to search for an error message if 
it's new to you: http://www.google.com/search?q=XDMP-EXPNTREECACHEFULL 
and http://marklogic.markmail.org/search/?q=XDMP-EXPNTREECACHEFULL are 
good places to start. The short story is that the query's working set 
has to fit in the expanded tree cache.

Moving on to remedies, tuning the in-memory tree size will not affect 
XDMP-EXPNTREECACHEFULL. If you want to try tuning the server, then tune 
the expanded tree cache size. However, it's usually better to tune the 
query. XDMP-EXPNTREECACHEFULL usually means that your query is 
over-ambitious. The query might not be using indexes efficiently, or it 
might simply be a query with a gigantic working set.

I see that this is an update query. ACID properties require a read-lock 
on every document read by the query, and write-lock every document that 
is updated. If you expect to have 300k of these documents for the live 
system, then I would recommend breaking the work up into smaller 
transactions. While it is possible to modify 300k (or more) documents in 
one transaction, it is usually more efficient to modify a batch of 
documents at a time. I like to use batches of 500 or 1000. Besides 
performance concerns, this technique is helpful when you encounter an 
error in the 299,999th document: you only have to reprocess that batch.

Finally, you might be interested in http://marklogic.github.com/corb/ 
which is intended to help automate this sort of bulk-update.

-- Mike

On 2010-09-23 20:14, Zegarek, Arthur wrote:
> I am getting  an XDMP-EXPNTREECACHEFULL error – not sure how to get around it.
>
> Trying to write an xquery that reads through a control list to obtain a list 
> of catalog elements tha require updating, along with a start_date/end_date 
> value to update in main catalog.
>
> When I load the control xml (DevWithoutExclProduct.xml ) up with more than 
> 2000 items or so, I get XDMP-EXPNTREECACHEFULL.  In memory tree size is set 
> to 1Gb
>
> I have 3 versions of the code – details below.  I would think the 2nd  or 3rd 
> versions would not incur the problem, since in this version I isolate the 
> logic in a function that is called with just a single node each time.  I 
> understand the issue is too many nodes being kept I scope, but how can you 
> get around this in a single xquery call, without breaking up the data to make 
> multiple calls to ML Server?  If I limit DevWithoutExclProduct.xml  to 1500 
> products or so, it runs through without the exception.
>
> Currently running this in our dev environment where we have approx 54000 
> products in the Internal collection . In Prod it is more like 300000.
>
> Version 1:
> declare namespace RSUITE="http://www.reallysi.com";
> declare namespace adbl="http://www.audible.com/publisherToRepository";
>
> for $excl in doc("DevWithoutExclProduct.xml")/prods/product, $rsuite in 
> collection("Internal")/product
>
> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT
> let $prod_id := $excl/prods/product/prod_id
>
> let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT
> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT
>
> let $exi := exists($excl_product )
>
> let $start := $excl/start
> let $end := $excl/end
>
> let $repl_node 
> :=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT>
>
> where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text()
>
>
> return
> <a>
> {
> if( $exi = true() )
> then
>     xdmp:node-replace($excl_product, $repl_node)
> else
>   xdmp:node-insert-after($excl_content, $repl_node)
>
> }
> </a>
>
> In this version, the control list, DevWithoutExclProduct.xml, is joined in 
> the same for loop as the main catalog
> Error returned is:
> XDMP-EXPNTREECACHEFULL: for $rsuite as item()* in 
> collection("Internal")/child::product -- Expanded tree cache full on host 
> rsuite.ofc.dev.ewr.audible.com
> line 4
> =
> /use-cases/eval2.xqy line 2
>
>
> Version 2 – Here I tried isolating the functionality in a function, and call 
> the function in a separate for loop that reads through the control. So I am 
> unclear why I am setill getting the tree cache full, given that the function 
> is called with just a single node each time.   Note the difference in the 
> error reported – here there are 2 lines mentioned.
>
> declare namespace RSUITE="http://www.reallysi.com";
> declare namespace adbl="http://www.audible.com/publisherToRepository";
>
> define function update_excl_prod( $excl as node() ) as element()   (: Call 
> with just 1 node !  :)
> {
>
> for $rsuite in collection("Internal")/product
>
> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT
> let $prod_id := $excl/prods/product/prod_id
>
> let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT
> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT
>
> let $exi := exists($excl_product )
>
> let $start := $excl/start
> let $end_dt := $excl/end
>
> let $repl_node 
> :=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end_dt/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT>
>
> where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text()
>
> return
> if( $exi = true() )
> then
>     xdmp:node-replace($excl_product, $repl_node)
> else
>   xdmp:node-insert-after($excl_content, $repl_node)
> }
>
> <result>{
> for $excl in doc("DevWithoutExclProduct.xml")/prods/product
> return update_excl_prod( $excl )
> }
>
> </result>
>
> Error returned here is:
> XDMP-EXPNTREECACHEFULL: for $rsuite as item()* in 
> collection("Internal")/child::product -- Expanded tree cache full on host 
> rsuite.ofc.dev.ewr.audible.com
> line 7
> =
> line 34
> =
> /use-cases/eval2.xqy line 2
>
> Version 3 – Here I tried limiting the for expression to search for the item, 
> removed the where clause.Same exception
>
> declare namespace RSUITE="http://www.reallysi.com";
> declare namespace adbl="http://www.audible.com/publisherToRepository";
>
> define function update_excl_prod( $excl as node() ) as element()   (: Call 
> with just 1 node !  :)
> {
>
> for $rs  in 
> fn:collection('Internal')/product/adbl:METADATA/adbl:CORE/adbl:ID[.= 
> $excl/prod_id/text() ]
>
> let $rsuite   := doc(xdmp:node-uri($rs  ))/product
>
> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT
> let $prod_id := $excl/prods/product/prod_id
>
> let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT
> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT
>
> let $exi := exists($excl_product )
>
> let $start := $excl/start
> let $end_dt := $excl/end
>
> let $repl_node 
> :=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end_dt/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT>
>
> (: where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = 
> $excl/prod_id/text()   :)
>
> return
> if( $exi = true() )
> then
>     xdmp:node-replace($excl_product, $repl_node)
> else
>   xdmp:node-insert-after($excl_content, $repl_node)
> }
>
> <result>{
> for $excl in doc("DevWithoutExclProduct.xml")/prods/product
> return update_excl_prod( $excl )
> }
>
> </result>
>
>
> Art
>
>
> Art Zegarek  |  Director of Data Architecture
> T: 973.820.0396    F: 973.820.0505    C: 732-735-2592
>
> audible.com
> 1 Washington Park, 16th Floor, Newark, NJ 07102
>
>


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to