> On Jul 8, 2016, at 08:36, Abhinav Upadhyay <[email protected]>
> wrote:
>
> On Fri, Jul 8, 2016 at 8:56 PM, Tom Ivar Helbekkmo <[email protected]>
> wrote:
>> Abhinav Upadhyay <[email protected]> writes:
>>
>>> We just need to handle the special cases where we don't want to stem :)
>>
>> ...or perhaps do the stemming only when the resulting stem is found in
>> /usr/share/dict/words?
>
> Yes, that's probably a good idea. I first need to write the custom
> tokenizer and I can probably use that dictionary to decide what to
> stem and what not to stem.
>
> -
> Abhinav
In principle a lot of technical names are marked up in mandoc as “.Tn foo”
which might provide a good list of words to “not stem.”
Erik Fair