I'm pretty sure you call IndexReader.terms(), go thru the TermEnum, and sort based on TermEnum.docFreq(). Will prob use a SortedMap (TreeMap) & a custom Comparator. Actually maybe you put each Term into a List and call Collections.sort() on it.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexR eader.html#terms() http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermEn um.html -----Original Message----- From: Vinay Kakade [mailto:vinaykakade@;yahoo.com] Sent: Thursday, November 14, 2002 10:03 PM To: [EMAIL PROTECTED] Subject: extracting top k frequently occuring terms from a given set of documents Hi I want to use Lucene to extract top 10 frequently occuring terms from the given set of HTML document. Please let me know how lucene can be used for this purpose. I want to know how can I get the frequently occuring terms, after building index on given set of documents using Lucene Indexer. Please help me regards Vinay. __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org> -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>
