Hi Abhinav, > On 8 Jul 2016, at 8:08 AM, Abhinav Upadhyay <[email protected]> > wrote: > > Hi Paul, > >> On Fri, Jul 8, 2016 at 7:24 AM, Paul Goyette <[email protected]> wrote: >> With a reasonably current 7.99.33 (less than a week old), I noticed that >> when I request >> >> apropos kms >> >> (expecting to find man pages referencing "xxxdrmkms"), it seems to find a >> lot of entries for "km". Is this intended? None of the found entries has >> "kms", only "km". >> >> I really didn't expecting to find anything about kilometers, or meta-keys, >> or khmer (cambodian language?)! > > This is one of the short comings of apropos(1) right now. While > indexing the man pages, the tokenizer does stemming of the words being > indexed. Stemming essentially tries to reduce the words to their root > words, for example > running --> run > eating -> eat > eats -> eat > listened -> listen Is there a way to disable the stemming (preferably config or environment?)
Thilo > > It does this by removing suffixes like 's', 'es', 'ing', 'ed' from the > words. Therefore, 'kms' when being indexed, gets stored as 'km'. Same > is the case for 'ffs', 'lfs', 'ntfs' etc :) > > It applies the same algorithm when doing the search, so when you enter > 'kms' it first stems it down to 'km' and then does the search. This is > needed because when doing the indexing, 'kms' was stored as 'km', so > now searching with 'kms' will not get you anything. > > Stemming is an essential part for implementing full text search and > except for such cases, it works really well. I'm planning to write a > custom tokenizer implementation which will not stem technical keywords > like kms, lfs, ntfs, nfs, etc. That will fix these problems, it's > coming soon :) > > - > Abhinav
