just be careful if removing stop words when enclosed in quotes...i.e. phrase

--- On Tue, 9/1/09, [email protected] 
<[email protected]> wrote:

From: [email protected] 
<[email protected]>
Subject: General Digest, Vol 63, Issue 3
To: [email protected]
Date: Tuesday, September 1, 2009, 11:17 AM

Send General mailing list submissions to
    [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
    http://xqzone.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
    [email protected]

You can reach the person managing the list at
    [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. Re: Regular expression - \b Word boundary (Michael Blakeley)
   2. RE: highlighting the snippet result    returned by
      additional-query constraint (Colleen Whitney)
   3. RE: "Stop words" using Marklogic (Tim Meagher)


----------------------------------------------------------------------

Message: 1
Date: Tue, 01 Sep 2009 10:44:46 -0700
From: Michael Blakeley <[email protected]>
Subject: Re: [MarkLogic Dev General] Regular expression - \b Word
    boundary
To: General Mark Logic Developer Discussion
    <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

I believe '\b' is equivalent to "start of line OR end of line OR 
non-word". This seems to work with 4.1-1:

xquery version "1.0-ml";
let $pat := '(^|\W)xxx(\W|$)'
for $test in ('xxx', 'a xxx b', 'xxxz xxx', 'xxx xxxz')
return replace($test, $pat, '$1yyy$2')

=>
yyy
a yyy b
xxxz yyy
yyy xxxz

-- Mike

On 2009-09-01 06:59, judie pearline wrote:
> Hi,
> In Marklogic when i tried to use regular expression \b (word boundary) in 
> replace function, its throwing an exception saying that "Invalid regular 
> expression".
>
> Code - fn:replace('organize the annual get-together','\bthe\b'," ")
>
> The main purpose is to replace the word "the" with an empty string. But the 
> string "the" in the word together should not get replaced.
>
> Please help me on this.
> Thanks,
> Judie
>
>
> ________________________________
> ________________________________
> Love Cricket? Check out live scores, photos, video highlights and more. Click 
> here<http://in.rd.yahoo.com/tagline_cricket_2/*http://cricket.yahoo.com>.



------------------------------

Message: 2
Date: Tue, 1 Sep 2009 10:50:35 -0700
From: Colleen Whitney <[email protected]>
Subject: RE: [MarkLogic Dev General] highlighting the snippet result
    returned by    additional-query constraint
To: General Mark Logic Developer Discussion
    <[email protected]>
Message-ID:
    <[email protected]>
Content-Type: text/plain; charset="utf-8"

Joshil,

Can you give an example of a document, a query, and the options you are using?  
  I want to make sure I understand what you’re hoping to get in the highlight.

--Colleen

From: [email protected] 
[mailto:[email protected]] On Behalf Of Joshil Avikkal
Sent: Tuesday, September 01, 2009 8:51 AM
To: [email protected]
Subject: [MarkLogic Dev General] highlighting the snippet result returned by 
additional-query constraint

Hi,

Is there a workaround to get the matching search term highlighted within 
<search: highlight> element when there is an additional-query on the 
search:search() query option?

Thanks,
--Joshil


---------------------------------------------------------------------------------------------



This message, including any attachments, contains confidential information 
intended for a specific individual and purpose, and is intended for the 
addressee only. Any unauthorized disclosure, use, dissemination, copying, or 
distribution of this message or any of its attachments or the information 
contained in this e-mail, or the taking of any action based on it, is strictly 
prohibited. If you are not the intended recipient, please notify the sender 
immediately by return e-mail and delete this message.



---------------------------------------------------------------------------------------------


-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20090901/759fced0/attachment-0001.html

------------------------------

Message: 3
Date: Tue, 1 Sep 2009 14:17:16 -0400
From: "Tim Meagher" <[email protected]>
Subject: RE: [MarkLogic Dev General] "Stop words" using Marklogic
To: "'General Mark Logic Developer Discussion'"
    <[email protected]>
Message-ID: <012301ca2b30$7096dc40$530a1...@grace>
Content-Type: text/plain; charset="us-ascii"

Hi Danny,

 

I have a similar need for using stopwords.  I can't just weight some
elements in my search higher than others because I'm dealing primarily with
variations of a critical search field, i.e., a serial publication title.  It
seems to me that removing stopwords from the search value in conjunction
with using cts:element-word-query() is the most fruitful way to improve
match results.  It could be that I don't fully understand the MarkLogic
options that are provided to use relevancy in such a case.

 

Thanks,

 

Tim Meagher

AAOM Consulting

 

  _____  

From: [email protected]
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Tuesday, September 01, 2009 12:16 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] "Stop words" using Marklogic

 

Hi Mano,

 

MarkLogic Server does not really have a concept of stop words, per se.  A
term is a term, and all the terms in a query are used  to calculate
relevance.   The relevance is calculated based on the term frequency and the
number of fragments in the database, so words that are typically thought of
as "stop words" will not add much to the score of its search results.

 

That being said, it is quite easy to have your application parse the query
text before generating a cts:query.  For example, if your application gets
its text from users via a text box in a browser, you can grab the text from
the request and do an appropriate fn:replace on the string, removing some
list of stop words.  I suspect for many stop word lists, the performance of
this would be fine, assuming the list is not that large.  Depending on how
your application is written, another approach might be to parse the query
after you construct the cts:query, removing unwanted terms.  Each approach
has advantages and disadvantages.

 

Another question to ask yourself is this: do you really need to remove the
stop words?  The main reason to remove them (it seems to me) is to give more
relevant answers, and I don't think it will end up making much difference
for that.  You might find better ways of improving your relevance such as
weighting some elements higher than others. 

 

-Danny

 

From: [email protected]
[mailto:[email protected]] On Behalf Of mano m
Sent: Tuesday, September 01, 2009 6:59 AM
To: [email protected]
Subject: [MarkLogic Dev General] "Stop words" using Marklogic

 

Hi,

 


We need to implement  "Stop words" in search application using Marklogic.
Will Mark Logic supports this through any API or do we need to implement our
own logic to achieve this? 

Please share your ideas. 

Thanks, 

Mano

 

  _____  

See the Web's breaking stories, chosen by people like you. Check out Yahoo!
Buzz <http://in.rd.yahoo.com/tagline_buzz_1/*http:/in.buzz.yahoo.com/> .

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20090901/ce9fc806/attachment.html

------------------------------

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


End of General Digest, Vol 63, Issue 3
**************************************



      
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to