[
https://issues.apache.org/jira/browse/SOLR-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553508#comment-16553508
]
Hoss Man commented on SOLR-12582:
---------------------------------
I'm not an expert on streaming expressions, and i have no first hand
familiarity with the significantTerms streaming souce – but from what i can
tell they are only orthogonally related.
the signficantTerms streaming source is somewhat comparable to a field facet
sorted by the relatedness() function – but it seems to have the normal
constraints of any streaming expression in terms of the source data fields, and
how data for the entire collection is stream processed on single node – the
relatedness() aggregation function isn't quite as limited, but also probably
not as powerful when that "stream the entire collection" usecase is what you
want.
as noted, relatedness() supports configuring arbitrary foreground/background
queries, but it can also be used on an facet type – not just "term" faceting,
so you can use it to score the buckets from any arbitrary facet (including
range facets or facet queries) and in particular deal with sub facets.
what does that all mean in terms of what we should say about one vs the other
in documentation? ... i dunno. I agree there should probably be some cross
linking between the documentation to help draw awareness of the two for folks
who find one, but the other might be more appropriate, but i'm not sure what
form that should take (hence filing this issue rather then just making the
change myself)
as far as trying to maintain the same option names – i don't know that that is
feasible or really makes sense – at least in so much as adding new options to
relatedness() using hte same names as the existing options on significantTerms.
notably the existing {{minDocFreq}} option on significantTerms is similar _in
concept_ to the {{min_popularity}} option proposed in SOLR-12581 for
relatedness(), but it would not really make sense to use {{minDocFreq}} as the
option name in SOLR-12581 since the relatedness() function isn't tied to
"terms" the way significantTerms is -- so "docFreq" has no real meaning, and
the more general "popularity" makes more sense (i suppose we could change the
option name in signifncatTerms -- but even then significantTerms doesn't
produce the same concept of "popularity" that relatedness() does, andeven if it
did because that that expression focuses exclusively on "terms" the concept of
"DocFreq" is very appropriate.
> Consider api/documentation synergies/overlap between JSON Faceting
> relatedness() function and significantTerms sreaming expression
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-12582
> URL: https://issues.apache.org/jira/browse/SOLR-12582
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Priority: Major
>
> In SOLR-12581, Alexandre asked the tangential question below, which i've spun
> off into it's own jira...
> {quote}
> Sort of a side-question, but this work _\[adding new options to the JSON
> faceting relatedness() aggregation\]_ seems to overlap/compliment the
> significantTerms work done for streaming/QueryParser:
> http://lucene.apache.org/solr/guide/7_4/stream-source-reference.html#significantterms
> Are we saying SignificantTerms is for simpler use cases (as fore/back queries
> are corpus-wide) and then go into relatedness() for more complex analysis?
> Should the options be roughly compatible where it makes sense and/or
> similarly named?
> Just wondering because I could see this confusing newbies trying to see when
> to use which option.
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]