Yes that explains the problem well.

Best solution so far is to rewrite all queries like this:

cts:search(doc(),
  cts:or-query((
    cts:word-query('search', 'exact'),
    cts:word-query('search', 'stemmed')
  ))
)

We're using lib-search to generate complex queries on a lot of fields
and facets, so making this change everywhere in the lib-search code
won't be trivial. And then there's presumably a large performance
impact, as it effectively doubles the number of queries.

So I'm still planning on removing all the xml:lang attributes...

Rob



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David
Sewell
Sent: 30 March 2009 14:56
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] stemmed searches

On Mon, 30 Mar 2009, Geert Josten wrote:

>> I don't like this solution but can't think of anything else.
>> Personally I think this is a poor feature of MarkLogic.
>> Turning stemming on/off should not affect the content base
>> searched. Everything should be searched, with content in the
>> configured language gaining the benefits of stemming.
>
> Are you sure that stemming is affecting which documents are being 
> searched? It does ofcourse affects how many results are found, but 
> since stemming won't work on old english, you will need to enter 
> exactly matching tokens to find results in old english texts. Stemming

> should only increase the hit ratio, not decrease it..

We have the same issue. It's more a problem of coding verbosity than 
anything else. We have stemmed searching set on our main document 
database. So given data like this

<p xml:lang="eng">In an earlier stage of the Common law it was death.
<foreign xml:lang="lat">si quis in aula regia pugnet, vel arma sua
extrahat et capiatur...</foreign></p>

because our default language is English, the following search returns
null results:

   cts:search(//p, "extrahat")

as it is stemmed, and stemmed search works only on text in elements with
@xml:lang = English. So the search must be rewritten as

   cts:search(//p,
      cts:word-query("extrahat", "exact")
   )

But then you lose the stemmed search, which you might want if the search
term was "stage" for example. So either you have to "and" all
your searches, or choose between one kind of search or the other.


David


-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [email protected]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to