1) To ignore document language you have to search unstemmed, a stemmed
search is constrained to the language set in the query (or the default).
The way we handle this is to run a stemmed query in the user's language
OR-ed with the same query unstemmed.

3) We don't bother with any stop word filtering because they'll have low
relevance anyway.


Rob

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert
Josten
Sent: 05 November 2012 10:21
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Searching using language features..

Hi,

Several language support related questions this time. Most have been
asked
before, but had trouble putting all answers together. So, I'm just going
to
ask them once more:

1) Others have asked before, but is there a trick to ignore language in
queries, and get results for all languages, without doing an or-query
for
all languages you are interested in?

2) MarkLogic has stemming support, but there is also a library to use
thesauri. What is the best way to integrate that into the search library
if
I would like to use thesauri to expand search terms before doing the
actual
search? Or other similar code that would be able to expand a term into a
list of all kinds of synonyms (or related terms)..

3) Stopwords: to my knowledge there are no built-in language-specific
lists
of stop words like 'the'. I know I can find stop words by searching for
the
top number of values (or words) and take the most common ones up to some
threshold (and perhaps synthesize static lists from that). But what is
the
most efficient way to eliminate those from a search string? I have some
code
of my own in which I tokenize and eliminate with xqy dynamically, on
each
call, but perhaps someone knows a smarter trick?

Cheers,
Geert


M.Sc. G.P.H. (Geert) Josten
Senior Developer


Dayon B.V.
Delftechpark 37b
2628 XJ Delft
The Netherlands

T +31 (0)88 26 82 570

[email protected]
www.dayon.nl

De informatie - verzonden in of met dit e-mailbericht - is afkomstig van
Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen.
Aan
dit bericht kunnen geen rechten worden ontleend.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to