On Oct 20, 2010, at 4:40 PM, Martin O'Shea wrote: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/201010.mbox/%3c128 > 7065863.4cb7110774...@netmail.pipex.net%3e will give you a better idea of > what I'm moving towards. > > It's all a bit grey at the moment so further investigation is inevitable. > > I expect that a combination of MySQL database storage and Lucene indexing is > going to be the end result.
I'd likely take the TermVectorMapper approach, but otherwise, yeah, I think you are on the right track. > > > > -----Original Message----- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: 20 Oct 2010 21 20 > To: java-user@lucene.apache.org > Subject: Re: Using a TermFreqVector to get counts of all words in a document > > > On Oct 20, 2010, at 2:53 PM, Martin O'Shea wrote: > >> Uwe >> >> Thanks - I figured that bit out. I'm a Lucene 'newbie'. >> >> What I would like to know though is if it is practical to search a single >> document of one field simply by doing this: >> >> IndexReader trd = IndexReader.open(index); >> TermFreqVector tfv = trd.getTermFreqVector(docId, "title"); >> String[] terms = tfv.getTerms(); >> int[] freqs = tfv.getTermFrequencies(); >> for (int i = 0; i < tfv.getTerms().length; i++) { >> System.out.println("Term " + terms[i] + " Freq: " + freqs[i]); >> } >> trd.close(); >> >> where docId is set to 0. >> >> The code works but can this be improved upon at all? >> >> My situation is where I don't want to calculate the number of documents > with >> a particular string. Rather I want to get counts of individual words in a >> field in a document. So I can concatenate the strings before passing it to >> Lucene. > > Can you describe the bigger problem you are trying to solve? This looks > like a classic XY problem: http://people.apache.org/~hossman/#xyproblem > > What you are doing above will work OK for what you describe (up to the > "passing it to Lucene" part), but you probably should explore the use of the > TermVectorMapper which provides a callback mechanism (similar to a SAX > parser) that will allow you to build your data structures on the fly instead > of having to serialize them into two parallel arrays and then loop over > those arrays to create some other structure. > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -------------------------- Grant Ingersoll http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org