On Wed, Mar 9, 2011 at 1:36 PM, Peter Desjardins <peter.desjardins.us@ gmail.com> wrote:
> Hi. > > I'm producing webhelp output > (http://www.thingbag.net/docbook/gsoc2010/doc/content/index.html) and > I noticed that when I search for the term "nucleus," the webhelp > search function removes the letter s and searches for "nucleu." > "Nucleus" is a commonly used term in my document. I see the same > behavior with the search term "zeus" and "tutus" becomes "tutu." > > Is this a configurable behavior? Is the search function purposely > simplifying my terms? > Hi Peter, The searching happens for the stemmed words of the given query. i.e. it purposely get the root words of the given search terms to provide better searching support. Link [1] has an small introduction on what stemmer does and the limitations it has. WebHelp uses Porter stemmer for English [2], and Snowball stemmers for several other languages [3]. Does it return false results for 'nucleu' when searched for 'nucleus'? We tested the search with stemming, and it worked as expected except some few glitches which is ignorable compared to the power it adds! [1] http://blog.kasunbg.org/2010/10/javascript-stemmer-for-french-language.html [2] http://snowball.tartarus.org/algorithms/porter/stemmer.html [3] http://docbook.sourceforge.net/release/xsl/current/webhelp/docs/content/ch03s02.html --Kasun > > Thanks. > > Peter Desjardins > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- ~~~*******'''''''''''''*******~~~ Kasun Gajasinghe, University of Moratuwa, Sri Lanka. Blog: http://blog.kasunbg.org Twitter: http://twitter.com/kasunbg
