[
https://issues.apache.org/jira/browse/SOLR-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325093#comment-17325093
]
Michael Gibney commented on SOLR-14996:
---------------------------------------
[~Hronom] if I understand correctly what you're trying to do, I actually don't
think the tag/ex is the right way to do it. Please forgive me reading between
the lines (and correct me if I'm wrong), but: it looks like you have multiple
docs per {{user_id}}, and each {{user_id}} has 1 or more associated
{{job_type}} values recorded across those (potentially multiple) docs.
Depending on what your schema (actual data -- I'm not talking about
{{schema.xml}}) looks like, you might be able to achieve what you want by using
a {{!join}} query. (Specifically, I think the approach I'm suggesting would
work if you can guarantee that the multiple docs for the same {{user_id}} are
will not contain the same {{job_type}} mapping). Basically something like:
{code}{!join from=user_id to=user_id v='job_type:thinker'}.
{code}
Faceting on {{job_type}} for the above domain (assuming validity of the
schema-related assumptions) should get you the facet counts you want. Note,
your {{numFound}} in this case will be high, because the domain would by design
contain multiple docs per {{user_id}}. If you want {{numFound}} for the domain
as duduped by {{user_id}}, your best option would probably be to use the JSON
Facet {{unique}} aggregate function?
wrt the way you were trying to use collapse/tag/ex, it looks like collapse gets
re-applied over the domain with the "selected" tag excluded; in which case the
"selected" tag is doing nothing. For the facet domain, collapse _does_ get
re-applied (over an unrestricted domain), but since the collapse post-filter
doesn't define an ordering for preferring which doc to use as "the" doc for a
{{user_id}} cluster, the output is essentially arbitrary wrt anything you're
likely to regard as relevant. (I note that the facet counts add to exactly 1000
-- presumably the cardinality of {{\*:*}})?
> Facet incorrect counts when FQ exclusion applied with collapsing
> ----------------------------------------------------------------
>
> Key: SOLR-14996
> URL: https://issues.apache.org/jira/browse/SOLR-14996
> Project: Solr
> Issue Type: Bug
> Components: faceting
> Affects Versions: 8.6.3
> Reporter: Yevhen Tienkaiev
> Priority: Critical
>
> *numFound* not correct according to what is displayed in facets with
> exclusion when used collapsing and FQ with tag.
> Here example query:
> {code}
> curl --location --request GET
> 'http://localhost:8981/solr/test/select?facet.field={!ex=selected}job_type&facet=on&fq={!collapse%20field=user_id}&fq={!tag=selected}job_type:thinker&q=*:*&rows=0'
> {code}
> result is:
> {code}
> {
> "responseHeader": {
> "zkConnected": true,
> "status": 0,
> "QTime": 15,
> "params": {
> "q": "*:*",
> "facet.field": "{!ex=selected}job_type",
> "fq": [
> "{!collapse field=user_id}",
> "{!tag=selected}job_type:thinker"
> ],
> "rows": "0",
> "facet": "on"
> }
> },
> "response": {
> "numFound": 850,
> "start": 0,
> "maxScore": 1.0,
> "numFoundExact": true,
> "docs": []
> },
> "facet_counts": {
> "facet_queries": {},
> "facet_fields": {
> "job_type": [
> "runner",
> 220,
> "developer",
> 202,
> "digger",
> 202,
> "thinker",
> 195,
> "ninja",
> 181
> ]
> },
> "facet_ranges": {},
> "facet_intervals": {},
> "facet_heatmaps": {}
> }
> }
> {code}
> as you can see there FQ with
> {code}
> {!tag=selected}job_type:thinker
> {code}
> and facets with
> {code}
> {!ex=selected}job_type
> {code}
> in results I see for *thinker* 195, but *numFound* is 850.
> Expected:
> *thinker* 195, *numFound* is 195
> *or*
> *thinker* 850, *numFound* is 850
> You can use this simple project to reproduce the issue
> https://github.com/Hronom/solr-cloud-basic-auth/tree/main/solr-cloud-playground-collapsing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]