Hi, Several language support related questions this time. Most have been asked before, but had trouble putting all answers together. So, I'm just going to ask them once more:
1) Others have asked before, but is there a trick to ignore language in queries, and get results for all languages, without doing an or-query for all languages you are interested in? 2) MarkLogic has stemming support, but there is also a library to use thesauri. What is the best way to integrate that into the search library if I would like to use thesauri to expand search terms before doing the actual search? Or other similar code that would be able to expand a term into a list of all kinds of synonyms (or related terms).. 3) Stopwords: to my knowledge there are no built-in language-specific lists of stop words like 'the'. I know I can find stop words by searching for the top number of values (or words) and take the most common ones up to some threshold (and perhaps synthesize static lists from that). But what is the most efficient way to eliminate those from a search string? I have some code of my own in which I tokenize and eliminate with xqy dynamically, on each call, but perhaps someone knows a smarter trick? Cheers, Geert M.Sc. G.P.H. (Geert) Josten Senior Developer Dayon B.V. Delftechpark 37b 2628 XJ Delft The Netherlands T +31 (0)88 26 82 570 [email protected] www.dayon.nl De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
