RE: Using a TermFreqVector to get counts of all words in a document

Martin O'Shea Wed, 20 Oct 2010 11:53:29 -0700

Uwe

Thanks - I figured that bit out. I'm a Lucene 'newbie'.


What I would like to know though is if it is practical to search a single
document of one field simply by doing this:

IndexReader trd = IndexReader.open(index);
        TermFreqVector tfv = trd.getTermFreqVector(docId, "title");
        String[] terms = tfv.getTerms();
        int[] freqs = tfv.getTermFrequencies();
        for (int i = 0; i < tfv.getTerms().length; i++) {
            System.out.println("Term " + terms[i] + " Freq: " + freqs[i]);
        }
        trd.close();

where docId is set to 0.

The code works but can this be improved upon at all?

My situation is where I don't want to calculate the number of documents with
a particular string. Rather I want to get counts of individual words in a
field in a document. So I can concatenate the strings before passing it to
Lucene.

-----Original Message-----
From: Uwe Schindler [mailto:[email protected]] 
Sent: 20 Oct 2010 19 40
To: [email protected]
Subject: RE: Using a TermFreqVector to get counts of all words in a document

TermVectors are only available when enabled for the field/document.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Martin O'Shea [mailto:[email protected]]
> Sent: Wednesday, October 20, 2010 8:23 PM
> To: [email protected]
> Subject: Using a TermFreqVector to get counts of all words in a document
> 
> Hello
> 
> 
> 
> I am trying to use a TermFreqVector to get a count of all words in a
Document
> as follows:
> 
> 
> 
>    // Search.
> 
>         int hitsPerPage = 10;
> 
>         IndexSearcher searcher = new IndexSearcher(index, true);
> 
>         TopScoreDocCollector collector =
> TopScoreDocCollector.create(hitsPerPage, true);
> 
>         searcher.search(q, collector);
> 
>         ScoreDoc[] hits = collector.topDocs().scoreDocs;
> 
> 
> 
>         // Display results.
> 
>         int docId = 0;
> 
>         System.out.println("Found " + hits.length + " hits.");
> 
>         for (int i = 0; i < hits.length; ++i) {
> 
>             docId = hits[i].doc;
> 
>             Document d = searcher.doc(docId);
> 
>             System.out.println((i + 1) + ". " + d.get("title"));
> 
>             IndexReader trd = IndexReader.open(index);
> 
>             TermFreqVector tfv = trd.getTermFreqVector(docId, "title");
> 
>             System.out.println(tfv.getTerms().toString());
> 
>             System.out.println(tfv.getTermFrequencies().toString());
> 
>         }
> 
> 
> 
> The code is very rough as its only an experiment but I'm under the
impression
> that the getTerms and getTermFrequencies methods for a TermFreqVector
> should allow each word and its frequency in the document to be displayed.
All I
> get though is a NullPointerError. The index consists of a single document
made
> up of a simple string:
> 
> 
> 
> IndexWriter w = new IndexWriter(index, analyzer, true,
> IndexWriter.MaxFieldLength.UNLIMITED);
> 
> addDoc(w, "Lucene for Dummies");
> 
> 
> 
> And the queryString being used is simply "dummies".
> 
> 
> 
> Thanks
> 
> 
> 
> Martin O'Shea.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Using a TermFreqVector to get counts of all words in a document

Reply via email to