Hi David,
To match exact element contents: use cts:element-value-query. But this requires
a element range index.
To match on a phrase within documents: just use cts:word-query with the whole
phrase as one of the search terms
To match any on or more word of a phrase within documents: you could use
cts:word-query(tokenize($term, " "))
To match all words of a phrase but in random order within documents: you could
use..
cts:and-query(
for $word in tokenize($term, " ")
return
cts:word-query($word)
)
Note: these queries are data types of their own. You can store them in
variables and combine them gradually to construct large query constructs, to
finally pass them to a single cts:search call..
HTH!
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
<http://www.daidalos.nl/>
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
www.daidalos.nl<http://www.daidalos.nl/>
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit
bericht kunnen geen rechten worden ontleend.
________________________________
From: [email protected]
[mailto:[email protected]] On Behalf Of Lee, David
Sent: maandag 16 november 2009 20:17
To: [email protected]
Subject: [MarkLogic Dev General] Analize text for matches.
I think I've achieved a 1% familiarity with MarkLogic now ! Its exciting !
Now to get to 2% I have a question which is either obvious or impossible :) Ok
2 questions !
I've been playing with cts:search(). Amazing. I can search 1GB of XML for
matches and get results in < .2 secs wow.
Ok now that I"m impressed I *want more !*
Q1: (easy I hope)
How can I do a search for not only an exact term but a total text content.
For example I'm doing this query
cts:search(/hl7:document , cts:word-query( $term , "exact" ) )
If term is say "AMA" then it matches an "AMA" in any element even if its part
of a sentence like
<text>Published by the AMA</text>
This is great. But what if I want an *exact total text equality* match, that
is only match
<some_element>AMA</text>
The xpath would be
//node()[text() eq $term]
Is there a cts:search for that ? Or should I just use the xpath ?
I like the cts:search() because then I can do pagination on it efficiently.
Q2: (harder)
Suppose I do a search or other extraction and get a string like
"Take two aspirin or codeine and call me in the morning"
I would like to search this entire string and find matches where any word in
the string is an exact match to some criteria.
A pseudo code might be
for $word in fn:tokenize( $text , " " )
return cts:search( // , $word )
But I'm sure thats slow as hell. Is there a direct function to do this
kind of thing ? I cant find one.
For example I might locate a match for "aspirin" and "codeine".
Similar to Q1: I'd like to restrict these matches to whole exact matches so I
dont find things like "take" and "morning".
Thanks for any suggestions on where to look !
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]<mailto:[email protected]>
812-482-5224
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general