RE: [MarkLogic Dev General] Text Updates Garbage Collection?

Neil Bradley Fri, 09 Oct 2009 00:16:42 -0700

Geert,

I fully understand the issues of finally converting to XML, and my example
as I said is a trivial, non-realistic example. It is indeed a vastly
over-simplified example. The process is required to clean-up pseudo XML into
correct, well-formed XML (and to do other complex replaces).


I had not considered the option of simply overriding the variable with a new
one with the same name, and I can see that approach might achieve the same
as the xdmp:set() function, though in both cases I would still like some
reassurance from people who know how it all works under the hood that either
or both of these techniques will prevent memory issues.

I would not want to consider them to be multiple steps, because "multiple"
would in this case means hundreds of steps. I am sure there would be a
performance concern there, as well as a maintenance headache!

Neil.



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: 09 October 2009 07:57
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Text Updates Garbage Collection?

Hi Neil,

First of all, you end your code with an xdmp:unquote, while having done
search/replaces pretending that the text is plain text, not XML. That is
rather risky. I hope your search patterns are a bit smarter than the ones
you are giving as examples, you don't want to be replacing parts of element
or attribute names by mistake, when you intend to perform replacements in
character data. But perhaps your example is just an oversimplified version
of what you are actually doing.

Secondly, it is allowed to have let statements redefining existing
variables. So you could write:

        let $Text := '....bla...'
        let $Text := replace($Text, 'bla', 'BLA')
        ...

So, you don't need to bother about xdmp:set. ;-)

Moreover, I am not sure that Mark Logic actually implemented let's as
variables, making it hard to predict what really happens when the query is
executed. Those let's could just as well be seen as macro definitions that
are replaced in the syntax parsing stage. But someone from Mark Logic will
have to confirm about this. There are optimizations applied as well, so
there is only one real option: do tests and measure..

Last but not least: if you have many search/replace operations and worry
that doing those in a single pass would exhaust memory usage, then you could
convert it to a multi-step process, and use the Content Processing Framework
to tie the separate steps together. Might be worth your while to take a look
at CPF anyhow. It is really interesting for import processes..

Kind regards,
Geert

>


Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan
dit bericht kunnen geen rechten worden ontleend.


> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Neil Bradley
> Sent: vrijdag 9 oktober 2009 12:57
> To: [email protected]
> Subject: [MarkLogic Dev General] Text Updates Garbage Collection?
>
> Hi,
>
>
>
> I want to check if there is likely to be any problem with
> memory exhaustion in the following scenario.
>
>
>
> I will have text documents stored in a MarkLogic database
> that I will to update using a large number of consecutive
> search/replaces, then finally convert to XML.
>
>
>
> It seems obvious to me that I could easily run out of memory
> if I adopt this approach (and have hundreds of replaces
> applied to large text documents). In this trivial example, I
> am simply converting the word "Document" to "DOCUMENT" in
> three steps, which I would obviously do in one for real, but
> just to show the method I originally considered...
>
>
>
>     let $Text :=
> ".............................................................
> . (large text document).............................."
>
>     let $NewText1 := fn:replace($Text, "Doc", "DOC")
>
>     let $NewText2 := fn:replace($NewText1, "ume", "UME"))
>
>     let $NewText3 := fn:replace($NewText2, "nt", "NT"))
>
>     let $XML := xdmp:unquote($NewText3)
>
>     return
>
>       $XML
>
>
>
> I am assuming that each variable contains a variant of the
> text document, so memory will quickly become exhausted.
>
>
>
> However, if I use xdmp:set(), would that solve the problem,
> because the first variable content is being replaced, and the
> later variables have no content at all?...
>
>
>
>     let $Text :=
> ".............................................................
> . (large text document).............................."
>
>     let $NewText1 := fn:replace($Text, "Doc", "DOC")
>
>     let $NewText2 := xdmp:set($NewText1,
> fn:replace($NewText1, "ume", "UME"))
>
>     let $NewText3 := xdmp:set($NewText1,
> fn:replace($NewText1, "nt", "NT"))
>
>     let $XML := xdmp:unquote($NewText1)
>
>     return
>
>       $XML
>
>
>
> Or would I still expect old text to still be occupying memory
> (lack of string garbage collection)?
>
>
>
> Thanks,
>
>
>
> Neil.
>
>
>
>
>
>
>
>

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Text Updates Garbage Collection?

Reply via email to