Michael, I'm doing something similar to an autocomplete on a textbox. As a word is typed I want to have a popup display number of hits for that particular word, a stretch goal would be to have similar words listed as well ranked by number of hits. This is a desktop app so I think that I can cache the list of terms in memory for quick retrieval and load it in at app startup.
Andy On Wed, Sep 2, 2009 at 10:38 AM, Michael Garski <[email protected]>wrote: > Andy, > > Enumerating over all of the terms in an index to retrieve the number of > instances of each is not going to be a fast operation. What is it that > you are trying to accomplish with the that data? > > Michael > > -----Original Message----- > From: Andrew Schuler [mailto:[email protected]] > Sent: Wednesday, September 02, 2009 7:34 AM > To: [email protected] > Subject: Re: enumerating all terms in index > > Michael, > > I was looking for all the terms in the index and the number of instances > of > each. I ended using IR.Terms and TermEnum but from some of the > discussions I > saw in my Google search it seemed like that might not be the best > (fastest) > way to accomplish this. Is this still the accepted best pracice? > > > On Mon, Aug 31, 2009 at 11:38 AM, Michael Garski > <[email protected]>wrote: > > > Andy, > > > > Are you looking for the number of documents that contain a term, or > the > > total number of term instances? > > > > To enumerate over all of the terms in an index, use IndexReader.Terms > to > > get a TermEnum to walk through the terms. From there you can use > > IndexReader.DocFreq to get the number of documents that contain a > term. > > To find the total number of occurrences of a term use > > IndexReader.TermDocs to retrieve the frequency of a term within a > > document. > > > > Hope that gets you in the right direction. > > > > Michael > > > > -----Original Message----- > > From: Andrew Schuler [mailto:[email protected]] > > Sent: Friday, August 28, 2009 6:38 PM > > To: [email protected] > > Subject: enumerating all terms in index > > > > This seems pretty straightforward but Google is failing me today. > > What is the generally accepted best (fastest) way to enumerate all the > > terms > > in and index with the number of times they occur? TIA. > > > > -andy > > > > > >
