Yeah, I think expanding the functionality of the terms component looks like the right place to add these stats.
I plan on exposing these types of terms stats as Streaming Expression functions but I would likely use the terms component under the covers. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Feb 22, 2017 at 8:56 AM, Shai Erera <[email protected]> wrote: > No, they are not global distributed stats. I am willing to live with > approximated stats though (unless again, there's an API which can give me > both). I wonder why doesn't Terms component return ttf in addition to > docfreq. The API (at the Lucene level) is right there already. > > On Wed, Feb 22, 2017 at 3:49 PM Joel Bernstein <[email protected]> wrote: > >> Hi Shai, >> >> Do ttf and docfreq return global stats in distributed mode? I wasn't >> aware that there was a mechanism for aggregating values in the field list. >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Wed, Feb 22, 2017 at 7:18 AM, Shai Erera <[email protected]> wrote: >> >> Hi >> >> I am currently using function queries to obtain these two statistics, as >> I didn't see a better or more explicit API and the Terms component only >> returns docFreq, but not totalTermFreq. >> >> The way I use the API is submit requests as follows: >> >> curl "http://localhost:8983/solr/mycollection/select?q=*:*& >> rows=1&fl=ttf(text,'t1'),docfreq(text,'t1')" >> >> Today I noticed that it sometimes returns 0 for these stats for existing >> terms. After debugging and going through the code, I noticed that it >> performs analysis on the value that's given. So if I provide an already >> stemmed value, it analyzes the value further and in some cases it results >> in a non-existing term (and in other cases I get stats for a term I didn't >> ask for). >> >> I want to get the stats of the indexed version of the terms, and that's >> why I send the already stemmed one. In my case I tried to get the stats for >> the term 'disguis' which is the stem of 'disguise' and 'disguised', however >> it further analyzed the value to 'disgui' (per the analysis chain) and that >> term does not exist in the index. >> >> So first question is -- is this the right API to retrieve such >> statistics? I didn't find another one, but could be I missed it. >> >> If it is, why does it analyze the value? I tried to wrap the value with >> single and double quotes, but of course that does not affect the analysis >> ... is analysis an intended behavior or a bug? >> >> Shai >> >> >>
