Hi Kari, MarkLogic uses lazy evaluation, so could potentially stream over the large file to process it. You could check whether eliminating $pacer_doc with a direct evaluation of doc(..) in the flwor works faster, but before you do that, consider rewriting this line:
let $translation:= $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() You could consider building a map:map out of that first, and then using that in the flwor for faster lookups.. Cheers, Geert From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Monday, May 23, 2016 at 8:53 PM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded The file is used in a different application that I don’t have control over, so I am just adjusting the data that’s in the file – to fix the firmname (correcting some typo’s and inconsistencies they had and continue to have – can’t really prevent that because the service pulls the data from various public court records and every law clerk seems to have their own way of entering the data). When my script is doing: for $firms in $pacer_doc//(counsel|party) … Is there a better way than load the doc nodes in a for loop – maybe some other function I am not aware of or another flowr ? From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten Sent: Monday, May 23, 2016 11:44 AM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded Hi Kari, 13 Mb isn’t really big actually, but big enough to perform less optimal, and cause timeouts. You could just increase the timeout, but it is probably a better idea to revise your strategy, and consider breaking your large file into record-like files (each containing just one firm for instance). You can then make much more use of the search capabilities of MarkLogic. Cheers, Geert From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Monday, May 23, 2016 at 8:40 PM To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded There must be a better way to do this. My script works fine when it’s loading a document that is not very large, but occassionally one of the docs is massive (13Mb on one of my error issues), and when that happens, in my application I get an error like: com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded The script is basically getting a uri, reading it back and comparing the ‘firmname’ nodes (there can be many in the same document), and if it differs in the shortlist.xml, we change it to what that file says it should be. The problem with my large file – there’s over 72,000 lawfirms it’s trying to compare This is my script – anyone have a suggestion of a better way to accomplish what I am attempting? xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; declare variable $uri as xs:string external; let $uri := try { ($uri) } catch ($e) { "" } (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :) let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml") for $this_uri in "$uri" let $doc := fn:doc($uri) let $pacer_doc:=$doc for $firms in $pacer_doc//(counsel|party) let $theOrigFirmname:= $firms/originalFirmname let $theFirmname:= $firms/firmname let $translation:= $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() for $firm in $pacer_doc return if( fn:exists($translation) and fn:exists($theFirmname) and ($translation ne $theFirmname ) ) then ( fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME TRANSLATION: ",$translation, " IN URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " ,$uri)), xdmp:node-replace($theFirmname,<firmname>{$translation}</firmname>) ) else ( fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: ",$theFirmname, " in URI: " ,$uri)) ) ________________________________ ALM, an information and intelligence company, provides customers with critical news, data, analysis, marketing solutions and events to successfully manage the business of business. Customers use ALM solutions to discover new ideas and approaches for solving business challenges, connect to the right professionals and peers to move business forward, and compete to win through access to data, analytics and insight. ALM serves a community of over six million business professionals seeking to discover, connect and compete in highly complex industries. Learn more at www.alm.com. ________________________________ ALM, an information and intelligence company, provides customers with critical news, data, analysis, marketing solutions and events to successfully manage the business of business. Customers use ALM solutions to discover new ideas and approaches for solving business challenges, connect to the right professionals and peers to move business forward, and compete to win through access to data, analytics and insight. ALM serves a community of over six million business professionals seeking to discover, connect and compete in highly complex industries. Learn more at www.alm.com.
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general