Hello all,

We have an extremely large number of terms in our indexes.  I want to be able 
to extract a sample of the terms, say something like every 128th term.   If I 
use code based on org.apache.lucene.misc.HighFreqTerms or 
org.apache.lucene.index.CheckIndex I would get a TermsEnum, call 
termEnum.next() 128 times, grab the term and then call next another 128 times.
termEnum = reader.terms();
while (termEnum.next()
{ }

Since the tii file contains every 128th (or IndexInterval ) term and it is 
loaded into memory, is there some programmatic way (in the public API) to read 
that data structure in memory rather than having to force Lucene to actually 
read the entire tis file by using termEnum.next() ?


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

Reply via email to