An additional caveat is that if you want and-ed terms to be found in the
same element, and not just in the same document (or fragment, more
precisely), you need to take additional steps. cts:element-query is
designed for this, I think. In the past, I ran into scalability
problems with this approach, but this was in the 3.X series, and I am
hoping (my fingers are crossed so hard they are practically bleeding)
this may have been sorted out better in 4.X. I'll be running some tests
in the next few days; let me know if you're interested in the result.
-Mike
Geert Josten wrote:
Hi David,
To match exact element contents: use cts:element-value-query. But this
requires a element range index.
To match on a phrase within documents: just use cts:word-query with
the whole phrase as one of the search terms
To match any on or more word of a phrase within documents: you could
use cts:word-query(tokenize($term, " "))
To match all words of a phrase but in random order within documents:
you could use..
cts:and-query(
for $word in tokenize($term, " ")
return
cts:word-query($word)
)
Note: these queries are data types of their own. You can store them in
variables and combine them gradually to construct large query
constructs, to finally pass them to a single cts:search call..
HTH!
Kind regards,
Geert
**
*Drs. G.P.H. Josten*
/Consultant/
*Daidalos BV*
/Source of Innovation/
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
www.daidalos.nl <http://www.daidalos.nl/>
KvK 27164984
**
De informatie - verzonden in of met dit emailbericht - is afkomstig
van Daidalos BV en is uitsluitend bestemd voor de geadresseerde.
Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.
------------------------------------------------------------------------
*From:* [email protected]
[mailto:[email protected]] *On Behalf Of
*Lee, David
*Sent:* maandag 16 november 2009 20:17
*To:* [email protected]
*Subject:* [MarkLogic Dev General] Analize text for matches.
I think I've achieved a 1% familiarity with MarkLogic now ! Its
exciting !
Now to get to 2% I have a question which is either obvious or
impossible :) Ok 2 questions !
I've been playing with cts:search(). Amazing. I can search 1GB
of XML for matches and get results in < .2 secs wow.
Ok now that I"m impressed I **want more !**
Q1: (easy I hope)
How can I do a search for not only an exact term but a total text
content.
For example I'm doing this query
cts:search(/hl7:document , cts:word-query( $term ,
"exact" ) )
If term is say "AMA" then it matches an "AMA" in any element even
if its part of a sentence like
<text>Published by the AMA</text>
This is great. But what if I want an **exact total text
equality** match, that is only match
<some_element>AMA</text>
The xpath would be
//node()[text() eq $term]
Is there a cts:search for that ? Or should I just use the xpath ?
I like the cts:search() because then I can do pagination on it
efficiently.
Q2: (harder)
Suppose I do a search or other extraction and get a string like
"Take two aspirin or codeine and call me in the
morning"
I would like to search this entire string and find matches where
any word in the string is an exact match to some criteria.
A pseudo code might be
for $word in fn:tokenize( $text , " " )
return cts:search( // , $word )
But I'm sure thats slow as hell. Is there a direct function
to do this kind of thing ? I cant find one.
For example I might locate a match for "aspirin" and "codeine".
Similar to Q1: I'd like to restrict these matches to whole exact
matches so I dont find things like "take" and "morning".
Thanks for any suggestions on where to look !
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected] <mailto:[email protected]>
812-482-5224
------------------------------------------------------------------------
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general