[
https://issues.apache.org/jira/browse/SOLR-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103386#comment-16103386
]
Yonik Seeley commented on SOLR-11159:
-------------------------------------
I don't see any incorrect bucket counts, just a missing value of "E"?
Refinement works like the following:
phase 1) collect the top N buckets from each shard and find the global "top N"
buckets
phase 2) correct the counts of this global "top N" by requesting counts from
shards that didn't provide a value for each bucket
So while we guarantee correct counts, we don't guarantee that a value is missed
altogether.
To increase the chances that we get the true global top N, we normally
overrequest on phase 1. But in your example, you explicitly disabled
overrequest.
To fix, simply remove the "overrequest:0" part of your request.
For other requests, you can increase this number to reduce or eliminate the
chance of missing buckets.
> Facet buckets count still incorrect after passing {refine:true} | SOLR-7542
> ---------------------------------------------------------------------------
>
> Key: SOLR-11159
> URL: https://issues.apache.org/jira/browse/SOLR-11159
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module
> Reporter: Amrit Sarkar
> Attachments: COUNT_DESC_LIMIT_2, COUNT_DESC_LIMIT_3, DOCS
>
>
> I was experimenting / analysing the new *Refinement* feature in JSON Facet
> Apis introduced in SOLR-7452. Passing {{refine:true}} with the facet
> definition.
> I am listing down the test-scenarios along with test-data:
> 3 sharded collection on 3 nodes
> node/shard: bucketVal - count
> 8987: C - 1
> 8983: C - 4 D - 1 E - 1 A - 1
> 8985: E - 2 A - 1 D - 1
> Total: BUCKETS
> C - 5 E - 3 D - 2 A - 2
> It is giving accurate results for COUNT ASC, LIMIT 1 - 4
> {code}
> curl http://localhost:8983/solr/collection1/select -d
> 'q=*:*&json.facet={cat_s:{type:terms,field:cat_s,sort:"count
> asc",limit:1,overrequest:0,refine:true}}&wt=json&indent=true'
> {code}
> {code}
> "facets":{
> "count":12,
> "cat_s":{
> "buckets":[{
> "val":"A",
> "count":2}]}}}
> {code}
> {code}
> curl http://localhost:8983/solr/collection1/select -d
> 'q=*:*&json.facet={cat_s:{type:terms,field:cat_s,sort:"count
> asc",limit:2,overrequest:0,refine:true}}&wt=json&indent=true'
> {code}
> {code}
> "facets":{
> "count":12,
> "cat_s":{
> "buckets":[{
> "val":"A",
> "count":2},
> {
> "val":"D",
> "count":2}]}}}
> {code}
> *BUT, COUNT DESC, LIMIT 2 and 3*
> {code}
> curl http://localhost:8983/solr/collection1/select -d
> 'q=*:*&json.facet={cat_s:{type:terms,field:cat_s,sort:"count
> desc",limit:2,overrequest:0,refine:true}}&wt=json&indent=true'
> {code}
> {code}
> "facets":{
> "count":12,
> "cat_s":{
> "buckets":[{
> "val":"C",
> "count":5},
> {
> "val":"A",
> "count":2}]}}}
> {code}
> {code}
> curl http://localhost:8983/solr/collection1/select -d
> 'q=*:*&json.facet={cat_s:{type:terms,field:cat_s,sort:"count
> desc",limit:3,overrequest:0,refine:true}}&wt=json&indent=true'
> {code}
> {code}
> "facets":{
> "count":12,
> "cat_s":{
> "buckets":[{
> "val":"C",
> "count":5},
> {
> "val":"A",
> "count":2},
> {
> "val":"D",
> "count":2}]}}}
> {code}
> *bucketVal {{E}} and its count {{3}} is not in facet response* Pardon me if I
> am missing some configuration or this behavior is right / justified. Ideally
> we should see bucketVal E and its count 3.
> I am attaching Index DOCS, debugQuery for COUNT DESC, LIMIT 2 and LIMIT 3.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]