Uwe
Thanks - I figured that bit out. I'm a Lucene 'newbie'.
What I would like to know though is if it is practical to search a single
document of one field simply by doing this:
IndexReader trd = IndexReader.open(index);
TermFreqVector tfv = trd.getTermFreqVector(docId, "title");
String[] terms = tfv.getTerms();
int[] freqs = tfv.getTermFrequencies();
for (int i = 0; i < tfv.getTerms().length; i++) {
System.out.println("Term " + terms[i] + " Freq: " + freqs[i]);
}
trd.close();
where docId is set to 0.
The code works but can this be improved upon at all?
My situation is where I don't want to calculate the number of documents with
a particular string. Rather I want to get counts of individual words in a
field in a document. So I can concatenate the strings before passing it to
Lucene.
-----Original Message-----
From: Uwe Schindler [mailto:[email protected]]
Sent: 20 Oct 2010 19 40
To: [email protected]
Subject: RE: Using a TermFreqVector to get counts of all words in a document
TermVectors are only available when enabled for the field/document.
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
> -----Original Message-----
> From: Martin O'Shea [mailto:[email protected]]
> Sent: Wednesday, October 20, 2010 8:23 PM
> To: [email protected]
> Subject: Using a TermFreqVector to get counts of all words in a document
>
> Hello
>
>
>
> I am trying to use a TermFreqVector to get a count of all words in a
Document
> as follows:
>
>
>
> // Search.
>
> int hitsPerPage = 10;
>
> IndexSearcher searcher = new IndexSearcher(index, true);
>
> TopScoreDocCollector collector =
> TopScoreDocCollector.create(hitsPerPage, true);
>
> searcher.search(q, collector);
>
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
>
>
>
> // Display results.
>
> int docId = 0;
>
> System.out.println("Found " + hits.length + " hits.");
>
> for (int i = 0; i < hits.length; ++i) {
>
> docId = hits[i].doc;
>
> Document d = searcher.doc(docId);
>
> System.out.println((i + 1) + ". " + d.get("title"));
>
> IndexReader trd = IndexReader.open(index);
>
> TermFreqVector tfv = trd.getTermFreqVector(docId, "title");
>
> System.out.println(tfv.getTerms().toString());
>
> System.out.println(tfv.getTermFrequencies().toString());
>
> }
>
>
>
> The code is very rough as its only an experiment but I'm under the
impression
> that the getTerms and getTermFrequencies methods for a TermFreqVector
> should allow each word and its frequency in the document to be displayed.
All I
> get though is a NullPointerError. The index consists of a single document
made
> up of a simple string:
>
>
>
> IndexWriter w = new IndexWriter(index, analyzer, true,
> IndexWriter.MaxFieldLength.UNLIMITED);
>
> addDoc(w, "Lucene for Dummies");
>
>
>
> And the queryString being used is simply "dummies".
>
>
>
> Thanks
>
>
>
> Martin O'Shea.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]