This sounds great.  Both doing it and digging up the reference chart from
the ancient smart docs.

One question that I have is whether the default Lucene similarity score fits
into this scheme.  See
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.htmlfor
example.

This could either go near the vector classes themselves (they will very
often get used together, but vectors are math and idf is application) or
into something like utils as you suggest.

My guess is that you should pick somewhere and see if anybody minds.  I kind
of expect they won't.

One critique that I might have is that using a string like this drops all
type checking.  Wouldn't be better to have a class with static members such
as LNC, LTC and so on?

On Wed, Oct 1, 2008 at 12:46 PM, Allen Day <[EMAIL PROTECTED]> wrote:

> I've implemented privately a few combinations of these for Vector:
>
>
> http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html
>
> Now I'm considering to make and contribute a more generic class based
> on SMART notation to do this.  Usage might be something like:
>
> ...
>
> Is it worthwhile to add a class to Mahout for these utility functions?
>  Where should this go?  mahout.utils?
>
>


-- 
ted

Reply via email to