This sounds great. Both doing it and digging up the reference chart from the ancient smart docs.
One question that I have is whether the default Lucene similarity score fits into this scheme. See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.htmlfor example. This could either go near the vector classes themselves (they will very often get used together, but vectors are math and idf is application) or into something like utils as you suggest. My guess is that you should pick somewhere and see if anybody minds. I kind of expect they won't. One critique that I might have is that using a string like this drops all type checking. Wouldn't be better to have a class with static members such as LNC, LTC and so on? On Wed, Oct 1, 2008 at 12:46 PM, Allen Day <[EMAIL PROTECTED]> wrote: > I've implemented privately a few combinations of these for Vector: > > > http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html > > Now I'm considering to make and contribute a more generic class based > on SMART notation to do this. Usage might be something like: > > ... > > Is it worthwhile to add a class to Mahout for these utility functions? > Where should this go? mahout.utils? > > -- ted
