An additional caveat is that if you want and-ed terms to be found in the same element, and not just in the same document (or fragment, more precisely), you need to take additional steps. cts:element-query is designed for this, I think. In the past, I ran into scalability problems with this approach, but this was in the 3.X series, and I am hoping (my fingers are crossed so hard they are practically bleeding) this may have been sorted out better in 4.X. I'll be running some tests in the next few days; let me know if you're interested in the result.

-Mike

Geert Josten wrote:
Hi David,
To match exact element contents: use cts:element-value-query. But this requires a element range index. To match on a phrase within documents: just use cts:word-query with the whole phrase as one of the search terms To match any on or more word of a phrase within documents: you could use cts:word-query(tokenize($term, " ")) To match all words of a phrase but in random order within documents: you could use.. cts:and-query(
   for $word in tokenize($term, " ")
   return
       cts:word-query($word)
)
Note: these queries are data types of their own. You can store them in variables and combine them gradually to construct large query constructs, to finally pass them to a single cts:search call.. HTH! Kind regards,
Geert

**
*Drs. G.P.H. Josten*

/Consultant/

        

*Daidalos BV*

/Source of Innovation/

Hoekeindsehof 1-4

2665  JZ  Bleiswijk

Tel.: +31 (0) 10 850 1200

Fax: +31 (0) 10 850 1199

www.daidalos.nl <http://www.daidalos.nl/>

KvK 27164984

        

**


De informatie - verzonden in of met dit emailbericht - is afkomstig van Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.

    ------------------------------------------------------------------------
    *From:* [email protected]
    [mailto:[email protected]] *On Behalf Of
    *Lee, David
    *Sent:* maandag 16 november 2009 20:17
    *To:* [email protected]
    *Subject:* [MarkLogic Dev General] Analize text for matches.

    I think I've achieved a 1% familiarity with MarkLogic now !  Its
    exciting !

    Now to get to 2% I have a question which is either obvious or
    impossible :)  Ok 2 questions !

    I've been playing with cts:search().   Amazing.  I can search 1GB
    of XML for matches and get results in < .2 secs wow.

    Ok now that I"m impressed I **want more !**

    Q1: (easy I hope)

    How can I do a search for not only an exact term but a total text
    content.

    For example I'm doing this query

                    cts:search(/hl7:document , cts:word-query( $term ,
    "exact" ) )

    If term is say "AMA" then it matches an "AMA" in any element even
    if its part of a sentence like

                    <text>Published by the AMA</text>

    This is great.  But what if I want an **exact total text
    equality** match,  that is only match

                    <some_element>AMA</text>

    The xpath would be

                    //node()[text() eq $term]

    Is there a cts:search for that ?  Or should I just use the xpath ?
    I like the cts:search() because then I can do pagination on it
    efficiently.

    Q2: (harder)

    Suppose I do a search or other extraction and get a string like

                    "Take two aspirin or codeine and call me in the
    morning"

    I would like to search this entire string and find matches where
    any word in the string is an exact match to some criteria.

    A pseudo code might be

                    for $word in fn:tokenize( $text , " " )

                  return cts:search( // , $word )

    But I'm sure thats slow as hell.      Is there a direct function
    to do this kind of thing ?   I cant find one.

For example I might locate a match for "aspirin" and "codeine".
    Similar to Q1: I'd like to restrict these matches to whole exact
    matches so I dont find things like "take" and "morning".

    Thanks for any suggestions on where to look !

    ----------------------------------------

    David A. Lee

    Senior Principal Software Engineer

    Epocrates, Inc.

    [email protected] <mailto:[email protected]>

    812-482-5224

------------------------------------------------------------------------

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to