RE: extracting top k frequently occuring terms from a given set of documents

Spencer, Dave Thu, 14 Nov 2002 22:38:13 -0800

I'm pretty sure you call IndexReader.terms(),
go thru the TermEnum,
and sort based on TermEnum.docFreq().
Will prob use a SortedMap (TreeMap) & 
a custom Comparator. Actually maybe 
you put each Term into a List
and call Collections.sort() on it.



http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexR
eader.html#terms()

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermEn
um.html

-----Original Message-----
From: Vinay Kakade [mailto:vinaykakade@;yahoo.com]
Sent: Thursday, November 14, 2002 10:03 PM
To: [EMAIL PROTECTED]
Subject: extracting top k frequently occuring terms from a given set of
documents


Hi
I want to use Lucene to extract top 10 frequently
occuring terms from the given set of HTML document.
Please let me know how lucene can be used for this
purpose. I want to know how can I get the frequently
occuring terms, after building index on given set of
documents using Lucene Indexer.
Please help me
regards
Vinay.

__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

RE: extracting top k frequently occuring terms from a given set of documents

Reply via email to