Hi Tim,

There is nothing wrong with using stop words, if that makes sense for your 
application.  I was trying to just suggest that you ask the question and run 
some tests to see if removing stop words really makes a difference in your 
application.  I think it is highly application-specific.

As far as the relevancy question, if you add terms that appear in most every 
document to a search, because the relevance is calculated based on the term 
frequency and the total number of documents, if you have a sufficiently  large 
database (a large total number of documents), then you will tend to get the 
same documents back from that search in approximately the same order, with or 
without stop words.   Again, your mileage may vary, and this can be very 
content-specific.

Partly, it comes down to this:  is it better to answer the exact question 
(query) that was asked or to infer what the user means by the question they 
asked?  So it seems to me it is an application issue.

-Danny


From: [email protected] 
[mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Tuesday, September 01, 2009 11:17 AM
To: 'General Mark Logic Developer Discussion'
Subject: RE: [MarkLogic Dev General] "Stop words" using Marklogic

Hi Danny,

I have a similar need for using stopwords.  I can't just weight some elements 
in my search higher than others because I'm dealing primarily with variations 
of a critical search field, i.e., a serial publication title.  It seems to me 
that removing stopwords from the search value in conjunction with using 
cts:element-word-query() is the most fruitful way to improve match results.  It 
could be that I don't fully understand the MarkLogic options that are provided 
to use relevancy in such a case.

Thanks,

Tim Meagher
AAOM Consulting

________________________________
From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Tuesday, September 01, 2009 12:16 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] "Stop words" using Marklogic

Hi Mano,

MarkLogic Server does not really have a concept of stop words, per se.  A term 
is a term, and all the terms in a query are used  to calculate relevance.   The 
relevance is calculated based on the term frequency and the number of fragments 
in the database, so words that are typically thought of as "stop words" will 
not add much to the score of its search results.

That being said, it is quite easy to have your application parse the query text 
before generating a cts:query.  For example, if your application gets its text 
from users via a text box in a browser, you can grab the text from the request 
and do an appropriate fn:replace on the string, removing some list of stop 
words.  I suspect for many stop word lists, the performance of this would be 
fine, assuming the list is not that large.  Depending on how your application 
is written, another approach might be to parse the query after you construct 
the cts:query, removing unwanted terms.  Each approach has advantages and 
disadvantages.

Another question to ask yourself is this: do you really need to remove the stop 
words?  The main reason to remove them (it seems to me) is to give more 
relevant answers, and I don't think it will end up making much difference for 
that.  You might find better ways of improving your relevance such as weighting 
some elements higher than others.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of mano m
Sent: Tuesday, September 01, 2009 6:59 AM
To: [email protected]
Subject: [MarkLogic Dev General] "Stop words" using Marklogic

Hi,

We need to implement  "Stop words" in search application using Marklogic. Will 
Mark Logic supports this through any API or do we need to implement our own 
logic to achieve this?
Please share your ideas.
Thanks,
Mano


________________________________
See the Web's breaking stories, chosen by people like you. Check out Yahoo! 
Buzz<http://in.rd.yahoo.com/tagline_buzz_1/*http:/in.buzz.yahoo.com/>.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to