On Oct 11, 2005, at 7:52 AM, Hugo Lafayette wrote:
Why do not include that in the FrenchStemFilter "next()" method
itself ?
It will be a bad design ?
I agree with your assessment. Conceptually, this is a stemming
problem. By extension, it's not a tokenizing problem, and the
behavior of the StandardTokenizer is fine.
The most recent Perl/CPAN version of the Snowball stemming library
added an option to strip leading l', d', etc in French. I know this
because until the most recent version, it didn't strip trailing 's in
English, either, and I had to write some workarounds, just like
you'll have to.
http://search.cpan.org/search?query=snowball
I'm curious: are there any cases in French where a string with an
apostrophe in it ought to be split into two searchable tokens? I
know of no such cases in English: you never want to search for the ll
in you'll, or the O in O'Reilly, etc.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]