Hi Paul, On Fri, Jul 8, 2016 at 7:24 AM, Paul Goyette <[email protected]> wrote: > With a reasonably current 7.99.33 (less than a week old), I noticed that > when I request > > apropos kms > > (expecting to find man pages referencing "xxxdrmkms"), it seems to find a > lot of entries for "km". Is this intended? None of the found entries has > "kms", only "km". > > I really didn't expecting to find anything about kilometers, or meta-keys, > or khmer (cambodian language?)!
This is one of the short comings of apropos(1) right now. While indexing the man pages, the tokenizer does stemming of the words being indexed. Stemming essentially tries to reduce the words to their root words, for example running --> run eating -> eat eats -> eat listened -> listen It does this by removing suffixes like 's', 'es', 'ing', 'ed' from the words. Therefore, 'kms' when being indexed, gets stored as 'km'. Same is the case for 'ffs', 'lfs', 'ntfs' etc :) It applies the same algorithm when doing the search, so when you enter 'kms' it first stems it down to 'km' and then does the search. This is needed because when doing the indexing, 'kms' was stored as 'km', so now searching with 'kms' will not get you anything. Stemming is an essential part for implementing full text search and except for such cases, it works really well. I'm planning to write a custom tokenizer implementation which will not stem technical keywords like kms, lfs, ntfs, nfs, etc. That will fix these problems, it's coming soon :) - Abhinav
