Re: TF-IDF API

Sengly Heng Wed, 28 Mar 2007 04:20:32 -0800

Thanks but in my case I do not have the files. What I have is just a
collection of vectors of terms.


Does lucene provide any mean to index each vector of terms as a file? Or
there is a better way to do?

Thank everyone once again.

Regards,

Sengly


On 3/28/07, thomas arni <[EMAIL PROTECTED]> wrote:


Hava a look at the "TermDocs" Interface in the API.

You can get term frequency  with a open IndexReader

TermDocs termDocs = reader.termDocs(term);

where "term" represents the current Term.

now you can call:

termDocs.freq()

to get the frequency of the term within the current document.

For the calculation of the idf, you can use the provided formula from
the "DefaultSimilarity".
To get the document frequency, which is necessary to calculate the idf,
you can call:

reader.docFreq(term)

Hope this helps...

Thomas


Sengly Heng wrote:
> Hello Luceners,
>
> I have a collections of vector of terms (token) that I extracted from
> files.
> I am looking for ways to calculate TF/IDF of each term.
>
> I wanted to use Lucene to do this but Lucene is made for collections of
> files and in my case I have already extracted those files into vector of
> terms. I know it is not very difficult to implement this measurement
> but I
> guess there should be such API available. Does anyone of you know any
> Java
> API that directly handle this problem? or I have to implement from
> scratch.
>
> Any idea would be highly appreciated.
>
> Thank you in advance.
>
> Best regards,
>
> Sengly
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TF-IDF API

Reply via email to