Thnx so far!
> -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:general- > [email protected]] Namens Whitby, Rob, Springer Healthcare > UK > Verzonden: maandag 5 november 2012 11:50 > Aan: MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] Searching using language features.. > > 1) To ignore document language you have to search unstemmed, a stemmed > search is constrained to the language set in the query (or the default). > The way we handle this is to run a stemmed query in the user's language > OR-ed with the same query unstemmed. > > 3) We don't bother with any stop word filtering because they'll have low > relevance anyway. > > > Rob > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Geert > Josten > Sent: 05 November 2012 10:21 > To: MarkLogic Developer Discussion > Subject: [MarkLogic Dev General] Searching using language features.. > > Hi, > > Several language support related questions this time. Most have been > asked > before, but had trouble putting all answers together. So, I'm just going > to > ask them once more: > > 1) Others have asked before, but is there a trick to ignore language in > queries, and get results for all languages, without doing an or-query > for > all languages you are interested in? > > 2) MarkLogic has stemming support, but there is also a library to use > thesauri. What is the best way to integrate that into the search library > if > I would like to use thesauri to expand search terms before doing the > actual > search? Or other similar code that would be able to expand a term into a > list of all kinds of synonyms (or related terms).. > > 3) Stopwords: to my knowledge there are no built-in language-specific > lists > of stop words like 'the'. I know I can find stop words by searching for > the > top number of values (or words) and take the most common ones up to some > threshold (and perhaps synthesize static lists from that). But what is > the > most efficient way to eliminate those from a search string? I have some > code > of my own in which I tokenize and eliminate with xqy dynamically, on > each > call, but perhaps someone knows a smarter trick? > > Cheers, > Geert > > > M.Sc. G.P.H. (Geert) Josten > Senior Developer > > > Dayon B.V. > Delftechpark 37b > 2628 XJ Delft > The Netherlands > > T +31 (0)88 26 82 570 > > [email protected] > www.dayon.nl > > De informatie - verzonden in of met dit e-mailbericht - is afkomstig van > Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit > bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. > Aan > dit bericht kunnen geen rechten worden ontleend. > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
