[ 
https://issues.apache.org/jira/browse/SOLR-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181308#comment-15181308
 ] 

Marius Grama commented on SOLR-8768:
------------------------------------

While debugging I've stumbled upon the code of the FacetField class

{code:title=FacetField.java}
  protected SimpleOrderedMap<Object> findTopSlots() throws IOException {
    // ....
    
    // add a modest amount of over-request if this is a shard request
    int lim = freq.limit >= 0 ? (fcontext.isShard() ? (int)(freq.limit*1.1+4) : 
(int)freq.limit) : Integer.MAX_VALUE;

    int maxsize = (int)(freq.limit >= 0 ?  freq.offset + lim : 
Integer.MAX_VALUE - 1);
    maxsize = Math.min(maxsize, nTerms);
    
    // ....
{code}

As it can be seen in the sample code above, when working with a limit of 2 
within the shard we'll retrieve only the top 6 (~2*1.1 + 4) slots. 
(In the case that the limit is not explicitly specified, the default value for 
the FacetField.limit field is 10, which corresponds to the top 25 slots.)
Because of this reason, the two documents corresponding for Tyrion with the 
value 1 for the field _seller_measure_ are not going to be taken into account 
when calculating the _top_sellers_ buckets. This is why the result bucket for 
Tyrion in the text above is not containing the expected value of 102, but 100 
instead.

Also this is the reason why in the sample provided for this ticket the result 
was valid (6 - count of  top slots > 3 - count of all the documents stored in 
the shard).


[[email protected]] this functionality exists in the FacetField since it was 
introduced in SOLR-7214.
Here http://yonik.com/json-facet-api/ is said 
bq. limit – Limits the number of buckets returned. Defaults to 10.

, but there is nothing mentioned about the fact that the values returned for 
the facet are approximate values and they get influenced by this parameter.

I think this is an important hint for the developers using (json) Facet API.
In the light of the things presented above is this still a "bug" or it can be 
considered as a "feature" ?

> Wrong behaviour in json facets
> ------------------------------
>
>                 Key: SOLR-8768
>                 URL: https://issues.apache.org/jira/browse/SOLR-8768
>             Project: Solr
>          Issue Type: Bug
>          Components: Facet Module
>            Reporter: Pablo Anzorena
>
> This bug is quite difficult to explain it, so I will first show it with an 
> example and then explain it.
> I have a core splitted into three shards, let's call them 'sellers_2014', 
> 'sellers_2015', 'sellers_2016'.
> The schema has the following fields:
> seller_name, string
> seller_measure, double
> seller_date, date
> With the following data.
> 'sellers_2014'
> Tyrion, 1, 2014-01-01T00:00:00Z
> Jon, 50, 2014-01-01T00:00:00Z
> PoorNed, 4, 2014-01-01T00:00:00Z
> 'sellers_2015'
> Tyrion, 100, 2015-01-01T00:00:00Z
> Jon, 50, 2015-01-01T00:00:00Z
> PoorNed, 4, 2015-01-01T00:00:00Z
> 'sellers_2016'
> Tyrion, 1, 2015-01-01T00:00:00Z
> Jon, 50, 2015-01-01T00:00:00Z
> PoorNed, 4, 2015-01-01T00:00:00Z
> Request:
> http://localhost:8983/solr/sellers_2016/select?q=*:*&shards=localhost:8983/solr/sellers_2014,localhost:8983/solr/sellers_2015,localhost:8983/solr/sellers_2016&json.facet=
> {code}
> {
>   top_sellers: {
>     type: terms,
>     field: seller_name,
>     limit: 2,
>     offset: 0,
>     sort: "seller_measure desc",
>     facet: {
>       seller_measure: "sum(seller_measure)",
>       seller_dates: {
>         type: range,
>         field: seller_date,
>         start: "2014-01-01T00:00:00Z",
>         end: "2016-12-31T00:00:00Z",
>         gap: "+1YEARS",
>         facet: {
>           seller_measure: "sum(seller_measure)"
>         }
>       }
>     }
>   }
> }
> {code}
> So... With the request I want to know the top 2 sellers across the three 
> shards and for each seller, their seller_measure for each year.
> The response I'm getting is:
> {code}
> "val": "Jon",
> "count": 3,
> "seller_measure": 150,
> "seller_dates": {
>   "buckets": [
>     {
>       "val": "2014-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 50
>     },
>     {
>       "val": "2015-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 50
>     },
>     {
>       "val": "2016-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 50
>     }
>   ]
> },
> "val": "Tyrion",
> "count": 3,
> "seller_measure": 102,
> "seller_dates": {
>   "buckets": [
>     {
>       "val": "2015-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 100
>     }
>   ]
> }
> {code}
> which is incorrect, because the two buckets of 2014 and 2016 in Tyrion are 
> missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to