I've implemented privately a few combinations of these for Vector:
http://nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html
Now I'm considering to make and contribute a more generic class based
on SMART notation to do this. Usage might be something like:
// initialize and populate these vectors...
Vector queryVector = new SparseVector(100);
Vector weightVector = new SparseVector(100);
// ...then use the weights vector to transform the query vector.
// apply log1p(tf(t)) * log(N/df(t) * 1. see nlp.stanford.edu link above.
Vector transformedVector =
org.apache.mahout.utils.VectorUtils.transform("ltn", queryVector,
weightVector);
Is it worthwhile to add a class to Mahout for these utility functions?
Where should this go? mahout.utils?
-Allen