There is a utility in the Apache Mahout project that dumps documents as weight vectors.
On Tue, Oct 5, 2010 at 11:01 AM, William Koscho <wkos...@gmail.com> wrote: > How do I get the weights for all terms in all documents? > > For a given set of documents, what are the series of API calls I need to > make to get the following type of information: > > doc1, termA_weight, termB_weight, etc.. > doc2, termC_weight, termD_weight, etc.. > doc3, termE_weight, termZ_weight, etc.. > > It seems that I have to start with a Query object, that is typically > provided by an end-user. However, in my case, I don't have an end user or > a > specific query. Instead I am trying to analyze the documents and > interested > in getting the weights of all terms so that I can compute some statistics > about the similarity among documents. > > Thanks in advance, > Bill >