[ 
https://issues.apache.org/jira/browse/SOLR-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553508#comment-16553508
 ] 

Hoss Man commented on SOLR-12582:
---------------------------------

I'm not an expert on streaming expressions, and i have no first hand 
familiarity with the significantTerms streaming souce – but from what i can 
tell they are only orthogonally related.

the signficantTerms streaming source is somewhat comparable to a field facet 
sorted by the relatedness() function – but it seems to have the normal 
constraints of any streaming expression in terms of the source data fields, and 
how data for the entire collection is stream processed on single node – the 
relatedness() aggregation function isn't quite as limited, but also probably 
not as powerful when that "stream the entire collection" usecase is what you 
want.

as noted, relatedness() supports configuring arbitrary foreground/background 
queries, but it can also be used on an facet type – not just "term" faceting, 
so you can use it to score the buckets from any arbitrary facet (including 
range facets or facet queries) and in particular deal with sub facets.

what does that all mean in terms of what we should say about one vs the other 
in documentation? ...  i dunno.  I agree there should probably be some cross 
linking between the documentation to help draw awareness of the two for folks 
who find one, but the other might be more appropriate, but i'm not sure what 
form that should take (hence filing this issue rather then just making the 
change myself)

as far as trying to maintain the same option names – i don't know that that is 
feasible or really makes sense – at least in so much as adding new options to 
relatedness() using hte same names as the existing options on significantTerms. 
 notably the existing {{minDocFreq}} option on significantTerms is similar _in 
concept_ to the {{min_popularity}} option proposed in SOLR-12581 for 
relatedness(), but it would not really make sense to use {{minDocFreq}} as the 
option name in SOLR-12581 since the relatedness() function isn't tied to 
"terms" the way significantTerms is -- so "docFreq" has no real meaning, and 
the more general "popularity" makes more sense (i suppose we could change the 
option name in signifncatTerms -- but even then significantTerms doesn't 
produce the same concept of "popularity" that relatedness() does, andeven if it 
did because that that expression focuses exclusively on "terms" the concept of 
"DocFreq" is very appropriate.


 

> Consider api/documentation synergies/overlap between JSON Faceting 
> relatedness() function and significantTerms sreaming expression
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-12582
>                 URL: https://issues.apache.org/jira/browse/SOLR-12582
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>
> In SOLR-12581, Alexandre asked the tangential question below, which i've spun 
> off into it's own jira...
> {quote}
> Sort of a side-question, but this work _\[adding new options to the JSON 
> faceting relatedness() aggregation\]_ seems to overlap/compliment the 
> significantTerms work done for streaming/QueryParser: 
> http://lucene.apache.org/solr/guide/7_4/stream-source-reference.html#significantterms
> Are we saying SignificantTerms is for simpler use cases (as fore/back queries 
> are corpus-wide) and then go into relatedness() for more complex analysis? 
> Should the options be roughly compatible where it makes sense and/or 
> similarly named?
> Just wondering because I could see this confusing newbies trying to see when 
> to use which option.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to