Just for clarification here, while Shannon's example makes fn:data of the Title 
element return the string that is desired, search tokenization does not.

For search purposes, each text node is tokenized separately.  A word boundary 
will never cross a text node.  The following demonstrates how this is tokenized:

let $x := <Title>Magnetic anisotropy data of 
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
for $textnode in $x//text()
return <tn>{$textnode}</tn>

=>
<tn>Magnetic anisotropy data of C</tn>
<tn>24</tn>
<tn>H</tn>
<tn>12</tn>

So there is no search term here for C24H12.  If you want that to be a search 
term (that is, a term to be used by cts:query), then you will have to mark up 
the document somehow to extract that term.  For example, you can rewrite this 
element as follows:

let $x := <Title>Magnetic anisotropy data of 
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
return 
element Title { attribute text {fn:string($x)},
 $x/node()}

=>
<Title text="Magnetic anisotropy data of C24H12">Magnetic anisotropy data of 
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>

Then you could do a cts:element-attribute-word-query on Title/@text to search 
for your terms.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Debabrata Jena
Sent: Tuesday, August 24, 2010 11:01 AM
To: General Mark Logic Developer Discussion
Cc: [email protected]
Subject: Re: [MarkLogic Dev General] Phase Through Search problem

Hi Shannon,
 
Thanks for the answer. The answer may solve my purpose. I think this might be 
the case.
 
-- Debabrata --
On Tue, Aug 24, 2010 at 11:18 PM, Shannon <[email protected]> wrote:
The data is being tokenized on whitespace, and you're introducing whitespace. 
Wouldn't the following solve the problem?

<Title>Magnetic anisotropy data of 
C<Subscript>24</Subscript>H<Subscript>12</Subscript></Title>
Just a guess..

On Aug 24, 2010, at 1:40 PM, Shannon wrote:

> Hi Debabarata,
>
> If I'm not mistaken, you want a "Word-Through" which is not currently 
> supported. MarkLogic has filed an RFE (#5849, "Enable per-database 
> word-through specifications", as well as a Word-Around) for consideration in 
> a future release. We have requested that this be implemented in v4.3. The 
> only work-around I know of is to duplicate the data to index the word token 
> in its entirety.
>
> On Aug 24, 2010, at 1:07 PM, Debabrata Jena wrote:
>
>> Hi,
>>
>> This is regarding not being able to search in for a phrase/search term in an 
>> element in which phrase is combination of text and node . Please find the 
>> details below and sample data attached.
>> Use Case : search for a phrase in which phrase is a combination of text and 
>> node. For ex. search for "Magnetic anisotropy data of C24H12" Following is 
>> the XML representation for the same phrase :
>>   <Title>
>>        Magnetic anisotropy data of C
>>       <Subscript>24</Subscript>
>>        H
>>       <Subscript>12</Subscript>
>>   </Title>
>> Approach followed: Added Phrase Through element for Subscript element so 
>> that text inside the Subscript element can be search able.
>>
>> Current State : we are not able to search for a following text "Magnetic 
>> anisotropy data of C24H12" in Title element by using cts:element-query. 
>> However, we are able to search for the same text if we pass the search term 
>> with spaces as following "Magnetic anisotropy data of C 24 H 12" in the 
>> cts:element-query.
>> Please advise what else needs to be done so that we can search successfully 
>> for the above scenario.
>>
>>
>> Thanks,
>> Debabarata
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to