Geert, I fully understand the issues of finally converting to XML, and my example as I said is a trivial, non-realistic example. It is indeed a vastly over-simplified example. The process is required to clean-up pseudo XML into correct, well-formed XML (and to do other complex replaces).
I had not considered the option of simply overriding the variable with a new one with the same name, and I can see that approach might achieve the same as the xdmp:set() function, though in both cases I would still like some reassurance from people who know how it all works under the hood that either or both of these techniques will prevent memory issues. I would not want to consider them to be multiple steps, because "multiple" would in this case means hundreds of steps. I am sure there would be a performance concern there, as well as a maintenance headache! Neil. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Geert Josten Sent: 09 October 2009 07:57 To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] Text Updates Garbage Collection? Hi Neil, First of all, you end your code with an xdmp:unquote, while having done search/replaces pretending that the text is plain text, not XML. That is rather risky. I hope your search patterns are a bit smarter than the ones you are giving as examples, you don't want to be replacing parts of element or attribute names by mistake, when you intend to perform replacements in character data. But perhaps your example is just an oversimplified version of what you are actually doing. Secondly, it is allowed to have let statements redefining existing variables. So you could write: let $Text := '....bla...' let $Text := replace($Text, 'bla', 'BLA') ... So, you don't need to bother about xdmp:set. ;-) Moreover, I am not sure that Mark Logic actually implemented let's as variables, making it hard to predict what really happens when the query is executed. Those let's could just as well be seen as macro definitions that are replaced in the syntax parsing stage. But someone from Mark Logic will have to confirm about this. There are optimizations applied as well, so there is only one real option: do tests and measure.. Last but not least: if you have many search/replace operations and worry that doing those in a single pass would exhaust memory usage, then you could convert it to a multi-step process, and use the Content Processing Framework to tie the separate steps together. Might be worth your while to take a look at CPF anyhow. It is really interesting for import processes.. Kind regards, Geert > Drs. G.P.H. Josten Consultant http://www.daidalos.nl/ Daidalos BV Source of Innovation Hoekeindsehof 1-4 2665 JZ Bleiswijk Tel.: +31 (0) 10 850 1200 Fax: +31 (0) 10 850 1199 http://www.daidalos.nl/ KvK 27164984 De informatie - verzonden in of met dit emailbericht - is afkomstig van Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend. > From: [email protected] > [mailto:[email protected]] On Behalf Of > Neil Bradley > Sent: vrijdag 9 oktober 2009 12:57 > To: [email protected] > Subject: [MarkLogic Dev General] Text Updates Garbage Collection? > > Hi, > > > > I want to check if there is likely to be any problem with > memory exhaustion in the following scenario. > > > > I will have text documents stored in a MarkLogic database > that I will to update using a large number of consecutive > search/replaces, then finally convert to XML. > > > > It seems obvious to me that I could easily run out of memory > if I adopt this approach (and have hundreds of replaces > applied to large text documents). In this trivial example, I > am simply converting the word "Document" to "DOCUMENT" in > three steps, which I would obviously do in one for real, but > just to show the method I originally considered... > > > > let $Text := > "............................................................. > . (large text document).............................." > > let $NewText1 := fn:replace($Text, "Doc", "DOC") > > let $NewText2 := fn:replace($NewText1, "ume", "UME")) > > let $NewText3 := fn:replace($NewText2, "nt", "NT")) > > let $XML := xdmp:unquote($NewText3) > > return > > $XML > > > > I am assuming that each variable contains a variant of the > text document, so memory will quickly become exhausted. > > > > However, if I use xdmp:set(), would that solve the problem, > because the first variable content is being replaced, and the > later variables have no content at all?... > > > > let $Text := > "............................................................. > . (large text document).............................." > > let $NewText1 := fn:replace($Text, "Doc", "DOC") > > let $NewText2 := xdmp:set($NewText1, > fn:replace($NewText1, "ume", "UME")) > > let $NewText3 := xdmp:set($NewText1, > fn:replace($NewText1, "nt", "NT")) > > let $XML := xdmp:unquote($NewText1) > > return > > $XML > > > > Or would I still expect old text to still be occupying memory > (lack of string garbage collection)? > > > > Thanks, > > > > Neil. > > > > > > > > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
