Hi,
You could just try the following code to print the term freq for individual
terms.
************************
public static void printTermFreq(String indexPath) throws
CorruptIndexException, IOException{
IndexReader ir = IndexReader.open(new NIOFSDirectory(new
File(indexPath)));
TermEnum te = ir.terms();
while(te.next()){
System.out.println(te.term().text()+":"+te.docFreq());
}
ir.close();
}
*************************
For phrases, you'd have to build/use a different tokenizer that tokenizes
the input text into those phrases (stored as a single term). Something of an
ngram, and then treat those phrases at terms.
Doing it at runtime would not be a feasible option.
--
Anshum Gupta
http://ai-cafe.blogspot.com
On Thu, Jan 20, 2011 at 3:30 PM, Ashish Pancholi <[email protected]>wrote:
>
> Using Lucene_3.0.3. we would like to implement following:
> The number of occurrences of the term in the entire index.
> For Example :
> If we have indexed following text : amazon, amazon s3, amazon
> simpledb, amazon aws;
> Then we are supposed to get results :
>
> amazon 4
> s3 1
> simpledb 1
> aws 1
>
> Showing phrases by combining the words, in the same sequence as they were
> in
> original text, before Indexing.
>
> Lets suppose, we have indexed following text : amazon, amazon s3,
> amazon simpledb, amazon aws;
> And for two word pharses we are supposed to get results
>
> amazon amazon
> amazon s3
> s3 amazon
> amazon simpledb
> simpledb amazon
> amazon aws
>
> And for three word pharses we are supposed to get results :
>
> amazon amazon s3
> amazon s3 amazon
> s3 amazon simpledb
> amazon simpledb amazon
> simpledb amazon aws
>
> Any help will be appreciated.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Please-Help-tp2293832p2293832.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>