The idea of adding a terms.ttf parameter sounds fine to me. And It would be
good to get terms.list better integrated into the TermsComponent.  In
general I think it's time for more attention to be paid to the
TermsComponent.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 22, 2017 at 4:12 PM, Shai Erera <[email protected]> wrote:

> Hmm .. so if I want to add totalTermFreq to the response, it will break
> the current output format of TermsComponent, which returns for each term
> only the docFreq. What's our BWC policy for such API and is there a way to
> handle it?
>
> I can add a new terms.ttf parameter, and so if you set it to true, the
> response will look different (each term will have both docFreq and
> totalTermFreq elements), but if you didn't, you will get the same response.
> Is that acceptable?
>
> Somewhat related, but can be handled separately, I noticed that if you
> specify terms.list and multiple terms.fl parameters, you only receive stats
> for the first field (the rest are ignored), but if you don't specify
> terms.list, you get results for all fields. I don't see any reason not to
> support multiple fields with terms list, what do you think?
>
> On Wed, Feb 22, 2017 at 10:08 PM Shai Erera <[email protected]> wrote:
>
>> Looks like this could be a very easy addition to TermsComponent? From
>> what I read in the code, it uses TermContext to compute/hold the stats, and
>> the latter already has docFreq and totalTermFreq (!!). It's just that
>> TermsComponent does not output TTF (only computes it...):
>>
>>     for(int i=0; i<terms.length; i++) {
>>       if(termContexts[i] != null) {
>>         String outTerm = fieldType.indexedToReadable(
>> terms[i].bytes().utf8ToString());
>>         int docFreq = termContexts[i].docFreq();
>>         termsMap.add(outTerm, docFreq);
>>       }
>>     }
>>
>>
>> On Wed, Feb 22, 2017 at 5:34 PM Joel Bernstein <[email protected]>
>> wrote:
>>
>> Yeah, I think expanding the functionality of the terms component looks
>> like the right place to add these stats.
>>
>> I plan on exposing these types of terms stats as Streaming Expression
>> functions but I would likely use the terms component under the covers.
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, Feb 22, 2017 at 8:56 AM, Shai Erera <[email protected]> wrote:
>>
>> No, they are not global distributed stats. I am willing to live with
>> approximated stats though (unless again, there's an API which can give me
>> both). I wonder why doesn't Terms component return ttf in addition to
>> docfreq. The API (at the Lucene level) is right there already.
>>
>> On Wed, Feb 22, 2017 at 3:49 PM Joel Bernstein <[email protected]>
>> wrote:
>>
>> Hi Shai,
>>
>> Do ttf and docfreq return global stats in distributed mode? I wasn't
>> aware that there was a mechanism for aggregating values in the field list.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, Feb 22, 2017 at 7:18 AM, Shai Erera <[email protected]> wrote:
>>
>> Hi
>>
>> I am currently using function queries to obtain these two statistics, as
>> I didn't see a better or more explicit API and the Terms component only
>> returns docFreq, but not totalTermFreq.
>>
>> The way I use the API is submit requests as follows:
>>
>> curl "http://localhost:8983/solr/mycollection/select?q=*:*&;
>> rows=1&fl=ttf(text,'t1'),docfreq(text,'t1')"
>>
>> Today I noticed that it sometimes returns 0 for these stats for existing
>> terms. After debugging and going through the code, I noticed that it
>> performs analysis on the value that's given. So if I provide an already
>> stemmed value, it analyzes the value further and in some cases it results
>> in a non-existing term (and in other cases I get stats for a term I didn't
>> ask for).
>>
>> I want to get the stats of the indexed version of the terms, and that's
>> why I send the already stemmed one. In my case I tried to get the stats for
>> the term 'disguis' which is the stem of 'disguise' and 'disguised', however
>> it further analyzed the value to 'disgui' (per the analysis chain) and that
>> term does not exist in the index.
>>
>> So first question is -- is this the right API to retrieve such
>> statistics? I didn't find another one, but could be I missed it.
>>
>> If it is, why does it analyze the value? I tried to wrap the value with
>> single and double quotes, but of course that does not affect the analysis
>> ... is analysis an intended behavior or a bug?
>>
>> Shai
>>
>>
>>
>>

Reply via email to