Ahh -- this makes sense. I thought it was too good to be true!
On Tue, Sep 7, 2010 at 4:45 AM, Michael McCandless <[email protected]> wrote: > This is expected/intentional, because computing the "true" unique term > count across multiple segments is exceptionally costly (you have to do > the merge sort to de-dup). > > If you really want the true count, you can pull the TermsEnum and > .next() until exhaustion. > > Alternatively, you can use IndexReader.getSequentialSubReaders(), then > step through each SegReader calling its .getUniqueTermCount() and then > somehow "approximate" (eg the sum will be an upper bound of the total > unique count). > > Mike > > On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley <[email protected]> wrote: >> Hello- >> >> I'm looking at using the new terms.getUniqueTermCount() to give a >> quick count for the LukeRequestHandler rather then needing to walk all >> the terms. >> >> When solr index reader has just one segment, it works great. However >> with more segments I get: >> >> java.lang.UnsupportedOperationException: this reader does not >> implement getUniqueTermCount() >> at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84) >> >> Is this expected? Is there any way around that? >> >> I am getting the terms using: >> >> Terms terms = MultiFields.getTerms(reader, fieldName); >> long cnt = (terms==null) ? 0 : terms.getUniqueTermCount(); >> >> Thanks >> ryan >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
