Or to extend on the idea of Mike, add two query terms, one case-sensitive, one case-insensitive, and give the later a lower weight..
Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:general- > [email protected]] Namens Mike Sokolov > Verzonden: zaterdag 17 maart 2012 16:09 > Aan: MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] Determining stems for proper nouns? > > On 3/17/2012 9:02 AM, David Sewell wrote: > > On Sat, 17 Mar 2012, Mike Sokolov wrote: > > > >> It looks as if it just doesn't "know" that there is such a thing as a Quaker > >> or a Whig, and doesn't apply rule-based stemming to unknown capitalized > >> words, which is sensible, because how could it know whether (for example): > >> > >> Barsoomians is a plural noun that could be stemmed or simply a name (David > >> Barsoomians) that should not. > >> > >> Just a guess, and I have no clue what the MarkLogic word list is, but I > >> suppose you could derive it from exhaustive search... > > Right, the brute-force fallback would be processing a lexicon list of > > all the capitalized words in the database. I'm sort of hoping to avoid > > that, though. > > > Have you considered a two-pass search where you widen by lower-casing > all terms when no results are found? The result wouldn't be as precise > as it could be if you knew which terms were in the stemming dict, but > would enable you to find Young as a name (or at the start of a sentence) > without matching young, and also match Quaker->Quakers. > > -Mike > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
