Kelly, Does that approach work with text documents?
Another issue is that, for reasons I do not want to expand on here, we want to process one document at a time through the step discussed here along with other prior and following steps, so I am not sure the benefits of this approach over the fn:replace() function. But it is certainly a interesting alternative. Neil. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Kelly Stirman Sent: 09 October 2009 10:57 To: [email protected] Subject: [MarkLogic Dev General] RE: Text Updates Garbage Collection? (Neil Bradley) HI Neil, Have you thought about using cts:highlight() to do the replacing of your string values? You basically construct a cts:or-query(()) of all the different values you'd like to replace: let $q := cts:or-query(("Doc","ume","nt")) Then you call cts:highlight() on the document. Normally you would use cts:highlight() to replace a matching string with some new markup for style, such as a span tag. It turns out you can use it to replace the matching string with whatever you want. Where cts:highlight() finds a match, you have some useful options. One is the $cts:queries variable, which returns the matching query for the text that is matched. You can use this with a lookup document like so: <replace> <item from="Doc">DOC</item> <item from="ume">UME</item> <item from="nt">NT</item> </replace> For each match, you'll get back a cts:query, and you can use this to find matches in your replace node, and use the substitution string as the value for the third argument in cts:highlight(): let $doc := <doc>I have some text that includes the words Doc, ume, and nt.</doc> let $replace := <replace> <item from="Doc">DOC</item> <item from="ume">UME</item> <item from="nt">NT</item> </replace> let $q := cts:or-query(("Doc","ume","nt")) return cts:highlight($doc,$q,local:replace($cts:queries,$replace)) --> <doc>I have some text that includes the words DOC, UME, and NT.</doc> This can be extended with cts:reverse-query() to perform custom enrichment on XML. Rather than having one large or-query() for all the strings you might want to replace, you would store a document with your query and any other useful metadata you wish to associate with the query. For example, if you wanted to do some custom enrichment on drug names, you might have a series of documents like this: <drug> <name type="commercial">Tylenol</name> <img type="commercial">/Thumbs/generic/acetamenophin.png</img> <name type="generic">Acetamenophin</name> <img type="generic">/Thumbs/generic/acetamenophin.png</img> <link>http://drugdictionary.com/drugid/j674ui832190</link> <query>{cts:or-query((cts:word-query("Tylenol","case-insensitive"),cts:word- query("Acetamenophin","case-insensitive")))}</query> </drug> And for each document you want to enrich, you would use the reverse indexes to see which drugs are in the document. This is a much easier approach to manage than an or-query() of thousands of drug names: cts:search(doc(),cts:reverse-query($new-document)) This would return the matching query documents, and you can then retrieve the queries from these docs and pass them to cts:highlight(). Here's how you might do that: let $drug-groups := cts:search(doc(),cts:reverse-query($doc)) let $query := cts:or-query((cts:query($drug-groups/drug/query/*))) return cts:highlight($doc,$query,local:drug-enrich($cts:queries,$drug-groups)) In this case, instead of a single replace document, the new value is one of several pieces of metadata you store with each query. You can write your own function to build elaborate replacement markup. Here's a simple example for the drugs: declare function local:drug-enrich($query as cts:query,$drug-groups as node()*){ let $this-drug := $drug-groups/drug/name[cts:contains(.,$query)] let $this-type := fn:data($this-drug/@type) let $other-type := if($this-type eq "commercial") then "generic" else "commercial" let $img := fn:data($this-drug/@img) let $link := $this-drug/../link/text() let $equivalent := $this-drug/../na...@type eq $other-type]/text() return <drug img="{$img}" link="{$link}">{$match} [{$equivalent}]</drug> }; Kelly Hi, I want to check if there is likely to be any problem with memory exhaustion in the following scenario. I will have text documents stored in a MarkLogic database that I will to update using a large number of consecutive search/replaces, then finally convert to XML. It seems obvious to me that I could easily run out of memory if I adopt this approach (and have hundreds of replaces applied to large text documents). In this trivial example, I am simply converting the word "Document" to "DOCUMENT" in three steps, which I would obviously do in one for real, but just to show the method I originally considered... let $Text := ".............................................................. (large text document).............................." let $NewText1 := fn:replace($Text, "Doc", "DOC") let $NewText2 := fn:replace($NewText1, "ume", "UME")) let $NewText3 := fn:replace($NewText2, "nt", "NT")) let $XML := xdmp:unquote($NewText3) return $XML I am assuming that each variable contains a variant of the text document, so memory will quickly become exhausted. However, if I use xdmp:set(), would that solve the problem, because the first variable content is being replaced, and the later variables have no content at all?... let $Text := ".............................................................. (large text document).............................." let $NewText1 := fn:replace($Text, "Doc", "DOC") let $NewText2 := xdmp:set($NewText1, fn:replace($NewText1, "ume", "UME")) let $NewText3 := xdmp:set($NewText1, fn:replace($NewText1, "nt", "NT")) let $XML := xdmp:unquote($NewText1) return $XML Or would I still expect old text to still be occupying memory (lack of string garbage collection)? Thanks, Neil. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: Friday, October 09, 2009 2:27 AM To: [email protected] Subject: General Digest, Vol 64, Issue 25 Send General mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit http://xqzone.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected] You can reach the person managing the list at [email protected] When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Performance Meters http test configuration (Curtis Wilde) 2. Re: Performance Meters http test configuration (Michael Blakeley) 3. Re: Performance Meters http test configuration (Curtis Wilde) 4. To set threshold for search:search results (mano m) 5. Text Updates Garbage Collection? (Neil Bradley) ---------------------------------------------------------------------- Message: 1 Date: Thu, 8 Oct 2009 16:06:24 -0600 From: Curtis Wilde <[email protected]> Subject: [MarkLogic Dev General] Performance Meters http test configuration To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset="utf-8" The performance meters tutorial does a good job at explaining how to execute xcc tests with performance meters, but it is less clear how an http test should work. I've taken a stab at a very simple http test with no success: <h:script xmlns:h="http://marklogic.com/xdmp/harness"> <h:test> <h:name>login</h:name> <h:set-up/> <h:tear-down/> <h:comment-expected-result><![CDATA[<response status="AUTHENTICATED"/>]]> </h:comment-expected-result> <h:query><![CDATA[login?username=foo&password=bar]]></h:query> </h:test> </h:script> The test makes a restful call (login) to a service, which should authenticate the specified user and receive the authenticated status message reply, but this never succeeds. In the address bar of the browser the call looks like: http://myTestServer:8030/login?username=foo&password=bar properties file: checkResults=true host=myTestServer port=8030 isRandomTest=false inputPath=../tests/httptests.xml numThreads=1 shared=false readSize=32768 recordResults=true #reporter=XMLReporter #outputPath=results.xml reporter=CSVReporter outputPath=../reports/ reportTime=true reportPercentileDuration=95 reportStandardDeviation=true testTime=0 testType=HTTP testListClass=com.marklogic.performance.XMLFileTestList Not sure what I'm doing wrong. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://xqzone.marklogic.com/pipermail/general/attachments/20091008/c2c8a698/ attachment-0001.html ------------------------------ Message: 2 Date: Thu, 08 Oct 2009 15:46:25 -0700 From: Michael Blakeley <[email protected]> Subject: Re: [MarkLogic Dev General] Performance Meters http test configuration To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset=UTF-8; format=flowed Curtis, Try testType=URI instead. The HTTP test type is more specialized: it posts the <h:query> value to a special "/evaluate.xqy" service on the target host. The idea with that test type is to evaluate arbitrary XQuery expressions. -- Mike On 2009-10-08 15:06, Curtis Wilde wrote: > The performance meters tutorial does a good job at explaining how to execute xcc tests with performance meters, but it is less clear how an http test should work. I've taken a stab at a very simple http test with no success: > > <h:script xmlns:h="http://marklogic.com/xdmp/harness"> > <h:test> > <h:name>login</h:name> > <h:set-up/> > <h:tear-down/> > <h:comment-expected-result><![CDATA[<response status="AUTHENTICATED"/>]]> > </h:comment-expected-result> > <h:query><![CDATA[login?username=foo&password=bar]]></h:query> > </h:test> > </h:script> > > The test makes a restful call (login) to a service, which should authenticate the specified user and receive the authenticated status message reply, but this never succeeds. In the address bar of the browser the call looks like: > > http://myTestServer:8030/login?username=foo&password=bar > > properties file: > checkResults=true > host=myTestServer > port=8030 > isRandomTest=false > inputPath=../tests/httptests.xml > numThreads=1 > shared=false > readSize=32768 > recordResults=true > #reporter=XMLReporter > #outputPath=results.xml > reporter=CSVReporter > outputPath=../reports/ > reportTime=true > reportPercentileDuration=95 > reportStandardDeviation=true > testTime=0 > testType=HTTP > testListClass=com.marklogic.performance.XMLFileTestList > > Not sure what I'm doing wrong. ------------------------------ Message: 3 Date: Thu, 8 Oct 2009 18:01:18 -0600 From: Curtis Wilde <[email protected]> Subject: Re: [MarkLogic Dev General] Performance Meters http test configuration To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="utf-8" Thanks for the guidance, but changing to URI is still unsuccessful. Manually requesting authentication with the browser should return: <response status="AUTHENTICATED"/> but I still receive <response status="NOT_AUTHENTICATED"/> (http://mytestserver:8030/login?username=foo&password=bar) This is not a problem with the service since currently any username/password combo will authenticate on our test system. I'll try to monitor the actual request through a proxy or something and see if it's getting mangled. On Thu, Oct 8, 2009 at 4:46 PM, Michael Blakeley < [email protected]> wrote: > Curtis, > > Try testType=URI instead. The HTTP test type is more specialized: it posts > the <h:query> value to a special "/evaluate.xqy" service on the target host. > The idea with that test type is to evaluate arbitrary XQuery expressions. > > -- Mike > > > On 2009-10-08 15:06, Curtis Wilde wrote: > >> The performance meters tutorial does a good job at explaining how to >> execute xcc tests with performance meters, but it is less clear how an http >> test should work. I've taken a stab at a very simple http test with no >> success: >> >> <h:script xmlns:h="http://marklogic.com/xdmp/harness"> >> <h:test> >> <h:name>login</h:name> >> <h:set-up/> >> <h:tear-down/> >> <h:comment-expected-result><![CDATA[<response >> status="AUTHENTICATED"/>]]> >> </h:comment-expected-result> >> <h:query><![CDATA[login?username=foo&password=bar]]></h:query> >> </h:test> >> </h:script> >> >> The test makes a restful call (login) to a service, which should >> authenticate the specified user and receive the authenticated status message >> reply, but this never succeeds. In the address bar of the browser the call >> looks like: >> >> http://myTestServer:8030/login?username=foo&password=bar >> >> properties file: >> checkResults=true >> host=myTestServer >> port=8030 >> isRandomTest=false >> inputPath=../tests/httptests.xml >> numThreads=1 >> shared=false >> readSize=32768 >> recordResults=true >> #reporter=XMLReporter >> #outputPath=results.xml >> reporter=CSVReporter >> outputPath=../reports/ >> reportTime=true >> reportPercentileDuration=95 >> reportStandardDeviation=true >> testTime=0 >> testType=HTTP >> testListClass=com.marklogic.performance.XMLFileTestList >> >> Not sure what I'm doing wrong. >> > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://xqzone.marklogic.com/pipermail/general/attachments/20091008/52dfdc33/ attachment-0001.html ------------------------------ Message: 4 Date: Thu, 8 Oct 2009 23:16:05 -0700 (PDT) From: mano m <[email protected]> Subject: [MarkLogic Dev General] To set threshold for search:search results To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset="iso-8859-1" Hi ? In a search application, we are performing the following steps: ? 1.???? A constant value is set as threshold. From the search response, get the total number of results and compare with threshold. ? 2.???? If the search result exceeds the threshold then display the search results. ? 3.???? Otherwise?will perform the "Did You Mean?" search (Spell check and auto correction using dictionary)?and display the result ? Please suggest me is there any efficient way to set the threshold instead of the constant. ? Regards, Mano Try the new Yahoo! India Homepage. Click here. http://in.yahoo.com/trynew -------------- next part -------------- An HTML attachment was scrubbed... URL: http://xqzone.marklogic.com/pipermail/general/attachments/20091008/9c8bdc52/ attachment-0001.html ------------------------------ Message: 5 Date: Fri, 9 Oct 2009 11:56:36 +0100 From: "Neil Bradley" <[email protected]> Subject: [MarkLogic Dev General] Text Updates Garbage Collection? To: <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii" Hi, I want to check if there is likely to be any problem with memory exhaustion in the following scenario. I will have text documents stored in a MarkLogic database that I will to update using a large number of consecutive search/replaces, then finally convert to XML. It seems obvious to me that I could easily run out of memory if I adopt this approach (and have hundreds of replaces applied to large text documents). In this trivial example, I am simply converting the word "Document" to "DOCUMENT" in three steps, which I would obviously do in one for real, but just to show the method I originally considered... let $Text := ".............................................................. (large text document).............................." let $NewText1 := fn:replace($Text, "Doc", "DOC") let $NewText2 := fn:replace($NewText1, "ume", "UME")) let $NewText3 := fn:replace($NewText2, "nt", "NT")) let $XML := xdmp:unquote($NewText3) return $XML I am assuming that each variable contains a variant of the text document, so memory will quickly become exhausted. However, if I use xdmp:set(), would that solve the problem, because the first variable content is being replaced, and the later variables have no content at all?... let $Text := ".............................................................. (large text document).............................." let $NewText1 := fn:replace($Text, "Doc", "DOC") let $NewText2 := xdmp:set($NewText1, fn:replace($NewText1, "ume", "UME")) let $NewText3 := xdmp:set($NewText1, fn:replace($NewText1, "nt", "NT")) let $XML := xdmp:unquote($NewText1) return $XML Or would I still expect old text to still be occupying memory (lack of string garbage collection)? Thanks, Neil. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://xqzone.marklogic.com/pipermail/general/attachments/20091009/8406b6db/ attachment.html ------------------------------ _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general End of General Digest, Vol 64, Issue 25 *************************************** _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
