I believe TermDocs(t).freq()) gives me the number of documents in which the term appears, as opposed to the total number of times the term appears. Any particular reason for choosing that metric (it seems less accurate, but maybe it's the only one that can be easily retrieved).
That's correct. This is computed because it is required for IDF weighting, the most common a priori term weighting technique.
As for the costs mentioned below. I'd assume that they be different from machine to machine.
They probably vary somewhat, but are probably fairly proportional. If one factor is 100 times another on one machine, it will still probably be considerably larger than the other in other situations.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]