Just for clarification here, while Shannon's example makes fn:data of the Title
element return the string that is desired, search tokenization does not.
For search purposes, each text node is tokenized separately. A word boundary
will never cross a text node. The following demonstrates how this is tokenized:
let $x := <Title>Magnetic anisotropy data of
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
for $textnode in $x//text()
return <tn>{$textnode}</tn>
=>
<tn>Magnetic anisotropy data of C</tn>
<tn>24</tn>
<tn>H</tn>
<tn>12</tn>
So there is no search term here for C24H12. If you want that to be a search
term (that is, a term to be used by cts:query), then you will have to mark up
the document somehow to extract that term. For example, you can rewrite this
element as follows:
let $x := <Title>Magnetic anisotropy data of
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
return
element Title { attribute text {fn:string($x)},
$x/node()}
=>
<Title text="Magnetic anisotropy data of C24H12">Magnetic anisotropy data of
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
Then you could do a cts:element-attribute-word-query on Title/@text to search
for your terms.
-Danny
From: [email protected]
[mailto:[email protected]] On Behalf Of Debabrata Jena
Sent: Tuesday, August 24, 2010 11:01 AM
To: General Mark Logic Developer Discussion
Cc: [email protected]
Subject: Re: [MarkLogic Dev General] Phase Through Search problem
Hi Shannon,
Thanks for the answer. The answer may solve my purpose. I think this might be
the case.
-- Debabrata --
On Tue, Aug 24, 2010 at 11:18 PM, Shannon <[email protected]> wrote:
The data is being tokenized on whitespace, and you're introducing whitespace.
Wouldn't the following solve the problem?
<Title>Magnetic anisotropy data of
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
Just a guess..
On Aug 24, 2010, at 1:40 PM, Shannon wrote:
> Hi Debabarata,
>
> If I'm not mistaken, you want a "Word-Through" which is not currently
> supported. MarkLogic has filed an RFE (#5849, "Enable per-database
> word-through specifications", as well as a Word-Around) for consideration in
> a future release. We have requested that this be implemented in v4.3. The
> only work-around I know of is to duplicate the data to index the word token
> in its entirety.
>
> On Aug 24, 2010, at 1:07 PM, Debabrata Jena wrote:
>
>> Hi,
>>
>> This is regarding not being able to search in for a phrase/search term in an
>> element in which phrase is combination of text and node . Please find the
>> details below and sample data attached.
>> Use Case : search for a phrase in which phrase is a combination of text and
>> node. For ex. search for "Magnetic anisotropy data of C24H12" Following is
>> the XML representation for the same phrase :
>> <Title>
>> Magnetic anisotropy data of C
>> <Subscript>24</Subscript>
>> H
>> <Subscript>12</Subscript>
>> </Title>
>> Approach followed: Added Phrase Through element for Subscript element so
>> that text inside the Subscript element can be search able.
>>
>> Current State : we are not able to search for a following text "Magnetic
>> anisotropy data of C24H12" in Title element by using cts:element-query.
>> However, we are able to search for the same text if we pass the search term
>> with spaces as following "Magnetic anisotropy data of C 24 H 12" in the
>> cts:element-query.
>> Please advise what else needs to be done so that we can search successfully
>> for the above scenario.
>>
>>
>> Thanks,
>> Debabarata
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general