Re: Using a TermFreqVector to get counts of all words in a document

Grant Ingersoll Wed, 20 Oct 2010 13:22:03 -0700

On Oct 20, 2010, at 2:53 PM, Martin O'Shea wrote:

> Uwe
> 
> Thanks - I figured that bit out. I'm a Lucene 'newbie'.
> 
> What I would like to know though is if it is practical to search a single
> document of one field simply by doing this:
> 
> IndexReader trd = IndexReader.open(index);
>        TermFreqVector tfv = trd.getTermFreqVector(docId, "title");
>        String[] terms = tfv.getTerms();
>        int[] freqs = tfv.getTermFrequencies();
>        for (int i = 0; i < tfv.getTerms().length; i++) {
>            System.out.println("Term " + terms[i] + " Freq: " + freqs[i]);
>        }
>        trd.close();
> 
> where docId is set to 0.
> 
> The code works but can this be improved upon at all?
> 
> My situation is where I don't want to calculate the number of documents with
> a particular string. Rather I want to get counts of individual words in a
> field in a document. So I can concatenate the strings before passing it to
> Lucene.


Can you describe the bigger problem you are trying to solve?  This looks like a 
classic XY problem: http://people.apache.org/~hossman/#xyproblem

What you are doing above will work OK for what you describe (up to the "passing 
it to Lucene" part), but you probably should explore the use of the 
TermVectorMapper which provides a callback mechanism (similar to a SAX parser) 
that will allow you to build your data structures on the fly instead of having 
to serialize them into two parallel arrays and then loop over those arrays to 
create some other structure.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using a TermFreqVector to get counts of all words in a document

Reply via email to