Re: solr getUniqueTermCount() when multiple segments?

Ryan McKinley Tue, 07 Sep 2010 02:49:31 -0700

Ahh -- this makes sense.  I thought it was too good to be true!


On Tue, Sep 7, 2010 at 4:45 AM, Michael McCandless
<[email protected]> wrote:
> This is expected/intentional, because computing the "true" unique term
> count across multiple segments is exceptionally costly (you have to do
> the merge sort to de-dup).
>
> If you really want the true count, you can pull the TermsEnum and
> .next() until exhaustion.
>
> Alternatively, you can use IndexReader.getSequentialSubReaders(), then
> step through each SegReader calling its .getUniqueTermCount() and then
> somehow "approximate" (eg the sum will be an upper bound of the total
> unique count).
>
> Mike
>
> On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley <[email protected]> wrote:
>> Hello-
>>
>> I'm looking at using the new terms.getUniqueTermCount() to give a
>> quick count for the LukeRequestHandler rather then needing to walk all
>> the terms.
>>
>> When solr index reader has just one segment, it works great.  However
>> with more segments I get:
>>
>> java.lang.UnsupportedOperationException: this reader does not
>> implement getUniqueTermCount()
>>        at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84)
>>
>> Is this expected?  Is there any way around that?
>>
>> I am getting the terms using:
>>
>>          Terms terms = MultiFields.getTerms(reader, fieldName);
>>          long cnt = (terms==null) ? 0 : terms.getUniqueTermCount();
>>
>> Thanks
>> ryan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: solr getUniqueTermCount() when multiple segments?

Reply via email to