Andy, Enumerating over all of the terms in an index to retrieve the number of instances of each is not going to be a fast operation. What is it that you are trying to accomplish with the that data?
Michael -----Original Message----- From: Andrew Schuler [mailto:[email protected]] Sent: Wednesday, September 02, 2009 7:34 AM To: [email protected] Subject: Re: enumerating all terms in index Michael, I was looking for all the terms in the index and the number of instances of each. I ended using IR.Terms and TermEnum but from some of the discussions I saw in my Google search it seemed like that might not be the best (fastest) way to accomplish this. Is this still the accepted best pracice? On Mon, Aug 31, 2009 at 11:38 AM, Michael Garski <[email protected]>wrote: > Andy, > > Are you looking for the number of documents that contain a term, or the > total number of term instances? > > To enumerate over all of the terms in an index, use IndexReader.Terms to > get a TermEnum to walk through the terms. From there you can use > IndexReader.DocFreq to get the number of documents that contain a term. > To find the total number of occurrences of a term use > IndexReader.TermDocs to retrieve the frequency of a term within a > document. > > Hope that gets you in the right direction. > > Michael > > -----Original Message----- > From: Andrew Schuler [mailto:[email protected]] > Sent: Friday, August 28, 2009 6:38 PM > To: [email protected] > Subject: enumerating all terms in index > > This seems pretty straightforward but Google is failing me today. > What is the generally accepted best (fastest) way to enumerate all the > terms > in and index with the number of times they occur? TIA. > > -andy > >
