On Thursday 10 November 2005 08:12, Chris Hostetter wrote: > > : For example I would like to find the set of terms used within a particular > : date range, where all Documents have a date field on them. What I've done > : to date is simply perform a query to find all Documents that match the > : date range query, then iterate over each one and construct a Set of all > : terms used in the particular field I'm interested in. > : > : I'm wondering if there a more efficient way to accomplish this? > > I believe there is -- provided the terms are index. > > 1) Get yourself a BitSet representing the Documents you are interested in > (you mentioned having a a date range, you can either use a RangeFilter nad > call the bits method directly, or you can do a search using a > HitCollector) > > 2) Look at the code that acctually makes RangeFilter work. It iterates > over a TermEnum between a low and high value. for each term it finds, it > uses a TermDocs to record the docid. You could do something very similar, > looping over all terms in the field you want. but instead of recording > the docid, add the term to your Set object -- if and only if one of the > docids from the TermDocs is in your BitSet from step #1 above.
Once the doc id's are available, an alternative to get the terms used in a particular field is to get them from the TermVectors of the docs. The field of interest must be indexed with TermVectors for this to work. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]