The locks themselves do not take up expanded tree-cache space. Instead, the query's working set has to fit in the expanded tree cache. It's true that the number of locks is generally proportional to the working set. However, if you were to disable the update function calls in your query, then it would no longer take any locks. But it would still need the same amount of expanded-tree cache space.
I'm glad to hear that you are familiar with PL/SQL batched updates. That PL/SQL processing model implies an external process controller: the PL/SQL program manages SQL statements, which perform the updates, and manages the commit interval around them. With XQuery, a similar model would also require a controlling program. You can do this by writing XQuery to manage XQuery (for example, xdmp:spawn a module that updates one batch), or you can use XCC/Java to manage XQuery (eg, the previously mentioned Corb), or HTTP requests, or XCC/.NET, etc. -- Mike On 2010-09-23 23:07, Zegarek, Arthur wrote: > Michael- > > Breaking up the updates into smaller batches is definitely possible, but.... > > Are you saying that holding the locks is causing the expanded tree cache to > fill up? > "I like to use batches of 500 or 1000"- My natural inclination is to try > and draw a parallel to SQL processing - I often have to do this type of > update against Oracle DB's and my normal practice would be to write a similar > PL/SQL routine which would commit every 500-1000 rows or so to release locks > - especially if doing this type of update activity on an OLTP type system. > > Is there a way to interject "commit" within the xquery which would have the > same effect (hopefully then allowing the processing to continue to completion > without the need to break it up into multiple calls)? What would that look > like in code? I've always heard references to the fact that ML has > transactional commit/rollback capability, but in practicality, my > observations have been that the updates seem to commit immediately. > > Thanks in advance- > > Art > > > -----Original Message----- > From: Michael Blakeley [mailto:[email protected]] > Sent: Friday, September 24, 2010 12:59 AM > To: General Mark Logic Developer Discussion > Cc: Zegarek, Arthur > Subject: Re: [MarkLogic Dev General] XDMP-EXPNTREECACHEFULL on ML 3.2 > > Arthur, I'm sorry to see that you wrote that much code while looking for > a solution. Sometimes it's helpful to search for an error message if > it's new to you: http://www.google.com/search?q=XDMP-EXPNTREECACHEFULL > and http://marklogic.markmail.org/search/?q=XDMP-EXPNTREECACHEFULL are > good places to start. The short story is that the query's working set > has to fit in the expanded tree cache. > > Moving on to remedies, tuning the in-memory tree size will not affect > XDMP-EXPNTREECACHEFULL. If you want to try tuning the server, then tune > the expanded tree cache size. However, it's usually better to tune the > query. XDMP-EXPNTREECACHEFULL usually means that your query is > over-ambitious. The query might not be using indexes efficiently, or it > might simply be a query with a gigantic working set. > > I see that this is an update query. ACID properties require a read-lock > on every document read by the query, and write-lock every document that > is updated. If you expect to have 300k of these documents for the live > system, then I would recommend breaking the work up into smaller > transactions. While it is possible to modify 300k (or more) documents in > one transaction, it is usually more efficient to modify a batch of > documents at a time. I like to use batches of 500 or 1000. Besides > performance concerns, this technique is helpful when you encounter an > error in the 299,999th document: you only have to reprocess that batch. > > Finally, you might be interested in http://marklogic.github.com/corb/ > which is intended to help automate this sort of bulk-update. > > -- Mike > > On 2010-09-23 20:14, Zegarek, Arthur wrote: >> I am getting an XDMP-EXPNTREECACHEFULL error – not sure how to get around >> it. >> >> Trying to write an xquery that reads through a control list to obtain a list >> of catalog elements tha require updating, along with a start_date/end_date >> value to update in main catalog. >> >> When I load the control xml (DevWithoutExclProduct.xml ) up with more than >> 2000 items or so, I get XDMP-EXPNTREECACHEFULL. In memory tree size is set >> to 1Gb >> >> I have 3 versions of the code – details below. I would think the 2nd or >> 3rd versions would not incur the problem, since in this version I isolate >> the logic in a function that is called with just a single node each time. I >> understand the issue is too many nodes being kept I scope, but how can you >> get around this in a single xquery call, without breaking up the data to >> make multiple calls to ML Server? If I limit DevWithoutExclProduct.xml to >> 1500 products or so, it runs through without the exception. >> >> Currently running this in our dev environment where we have approx 54000 >> products in the Internal collection . In Prod it is more like 300000. >> >> Version 1: >> declare namespace RSUITE="http://www.reallysi.com" >> declare namespace adbl="http://www.audible.com/publisherToRepository" >> >> for $excl in doc("DevWithoutExclProduct.xml")/prods/product, $rsuite in >> collection("Internal")/product >> >> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT >> let $prod_id := $excl/prods/product/prod_id >> >> let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT >> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT >> >> let $exi := exists($excl_product ) >> >> let $start := $excl/start >> let $end := $excl/end >> >> let $repl_node >> :=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT> >> >> where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text() >> >> >> return >> <a> >> { >> if( $exi = true() ) >> then >> xdmp:node-replace($excl_product, $repl_node) >> else >> xdmp:node-insert-after($excl_content, $repl_node) >> >> } >> </a> >> >> In this version, the control list, DevWithoutExclProduct.xml, is joined in >> the same for loop as the main catalog >> Error returned is: >> XDMP-EXPNTREECACHEFULL: for $rsuite as item()* in >> collection("Internal")/child::product -- Expanded tree cache full on host >> rsuite.ofc.dev.ewr.audible.com >> line 4 >> = >> /use-cases/eval2.xqy line 2 >> >> >> Version 2 – Here I tried isolating the functionality in a function, and call >> the function in a separate for loop that reads through the control. So I am >> unclear why I am setill getting the tree cache full, given that the function >> is called with just a single node each time. Note the difference in the >> error reported – here there are 2 lines mentioned. >> >> declare namespace RSUITE="http://www.reallysi.com" >> declare namespace adbl="http://www.audible.com/publisherToRepository" >> >> define function update_excl_prod( $excl as node() ) as element() (: Call >> with just 1 node ! :) >> { >> >> for $rsuite in collection("Internal")/product >> >> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT >> let $prod_id := $excl/prods/product/prod_id >> >> let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT >> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT >> >> let $exi := exists($excl_product ) >> >> let $start := $excl/start >> let $end_dt := $excl/end >> >> let $repl_node >> :=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end_dt/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT> >> >> where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = $excl/prod_id/text() >> >> return >> if( $exi = true() ) >> then >> xdmp:node-replace($excl_product, $repl_node) >> else >> xdmp:node-insert-after($excl_content, $repl_node) >> } >> >> <result>{ >> for $excl in doc("DevWithoutExclProduct.xml")/prods/product >> return update_excl_prod( $excl ) >> } >> >> </result> >> >> Error returned here is: >> XDMP-EXPNTREECACHEFULL: for $rsuite as item()* in >> collection("Internal")/child::product -- Expanded tree cache full on host >> rsuite.ofc.dev.ewr.audible.com >> line 7 >> = >> line 34 >> = >> /use-cases/eval2.xqy line 2 >> >> Version 3 – Here I tried limiting the for expression to search for the item, >> removed the where clause.Same exception >> >> declare namespace RSUITE="http://www.reallysi.com" >> declare namespace adbl="http://www.audible.com/publisherToRepository" >> >> define function update_excl_prod( $excl as node() ) as element() (: Call >> with just 1 node ! :) >> { >> >> for $rs in >> fn:collection('Internal')/product/adbl:METADATA/adbl:CORE/adbl:ID[.= >> $excl/prod_id/text() ] >> >> let $rsuite := doc(xdmp:node-uri($rs ))/product >> >> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT >> let $prod_id := $excl/prods/product/prod_id >> >> let $excl_content := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_CONTENT >> let $excl_product := $rsuite/adbl:METADATA/adbl:CORE/adbl:EXCLUSIVE_PRODUCT >> >> let $exi := exists($excl_product ) >> >> let $start := $excl/start >> let $end_dt := $excl/end >> >> let $repl_node >> :=<adbl:EXCLUSIVE_PRODUCT><adbl:START_DATE>{$start/text()}</adbl:START_DATE><adbl:END_DATE>{$end_dt/text()}</adbl:END_DATE></adbl:EXCLUSIVE_PRODUCT> >> >> (: where $rsuite/adbl:METADATA/adbl:CORE/adbl:ID/text() = >> $excl/prod_id/text() :) >> >> return >> if( $exi = true() ) >> then >> xdmp:node-replace($excl_product, $repl_node) >> else >> xdmp:node-insert-after($excl_content, $repl_node) >> } >> >> <result>{ >> for $excl in doc("DevWithoutExclProduct.xml")/prods/product >> return update_excl_prod( $excl ) >> } >> >> </result> >> >> >> Art >> >> >> Art Zegarek | Director of Data Architecture >> T: 973.820.0396 F: 973.820.0505 C: 732-735-2592 >> >> audible.com >> 1 Washington Park, 16th Floor, Newark, NJ 07102 >> >> > > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
