Also in 2.9.2 and 3.0.1: http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/IndexRea der.html#getUniqueTermCount() http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/IndexRea der.html#getUniqueTermCount()
Please note, this works only with SegmentReaders, so you have to first get the getSequentialSubReaders() and you *may* sum up the number on them. But this would not give the correct number, as segments may have (or in most cases they have lots of) overlapping terms. For an optimized index getSequentialSubReaders() returns one index and its unique term count is correct. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Thursday, May 27, 2010 9:44 PM > To: java-user@lucene.apache.org > Subject: Re: How to get the number of unique terms in the inverted index > > On Thu, May 27, 2010 at 2:32 PM, kannan chandrasekaran > <ckanna...@yahoo.com> wrote: > > I was wondering if there is a way to retrieve the number of unique terms > in the lucene ( version 2.4.0) ... I am aware of the terms() && terms(Term) > method that returns an enumeration (TermEnum) but that involves iterating > through the terms and couting them. I looking for something similar to > numdocs() in the IndexReader class. > > No there is not. > In 4.0-dev, with the new "flex" APIs, you can retrieve the number of unique > terms in a single segment (Terms.getUniqueTermCount()), but not a whole > index. > > -Yonik > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org