Hi,
On Fri, Mar 18, 2011 at 1:21 AM, Peter Desjardins <peter.desjardins.us@
gmail.com> wrote:
> Hi. I am fielding some questions about the search behavior in the
> webhelp output. Is there an explanation of the behavior available
> somewhere?
>
> Specifically, I need to understand:
>
> * How substrings are handled. Why does "locale" match "localeString"
> but "crea" doesn't match "create"?
>
The stemmed root word of create/created/creating/creat is "creat", so all of
these words produce same output. Stemmed word of "crea" is "crea" itself
which is actually not a word! And, "localeString" matches to "localStr"
(Stemmers tend to remove suffixes such as -ing, -ed) and 'locale" matches to
"local". Were these produced same output for you?
You can check how it behaves by executing javascript command stemmer(string)
via Google Chrome's console or via FindBugs for Firefox.
Results:
stemmer("create")
"creat"
stemmer("crea")
"crea"
stemmer("localeString")
"localeStr"
stemmer("locale")
"local"
>
> * Is there a way to search for strings that contain special characters
> like periods. Can I search for "foo.bar" by escaping the period? Can I
> remove the period from the list of special characters?
>
As David said, this is something we need to fix. stemming does not play any
part here. i.e. stemmer("foo.bar") == "foo.bar". Here, foo and bar got
indexed separately.
Regards,
--Kasun
> Thanks for your help. I have turned off stemming in case that matters.
>
> Peter Desjardins
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
--
~~~*******'''''''''''''*******~~~
Kasun Gajasinghe,
University of Moratuwa,
Sri Lanka.
Blog: http://kasunbg.blogspot.com
Twitter: http://twitter.com/kasunbg