On Sat, 17 Mar 2012, Mike Sokolov wrote: > It looks as if it just doesn't "know" that there is such a thing as a Quaker > or a Whig, and doesn't apply rule-based stemming to unknown capitalized > words, which is sensible, because how could it know whether (for example): > > Barsoomians is a plural noun that could be stemmed or simply a name (David > Barsoomians) that should not. > > Just a guess, and I have no clue what the MarkLogic word list is, but I > suppose you could derive it from exhaustive search...
Right, the brute-force fallback would be processing a lexicon list of all the capitalized words in the database. I'm sort of hoping to avoid that, though. -- David Sewell, Editorial and Technical Manager ROTUNDA, The University of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA Email: [email protected] Tel: +1 434 924 9973 Web: http://rotunda.upress.virginia.edu/ _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
