[ 
https://issues.apache.org/jira/browse/SOLR-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180181#comment-15180181
 ] 

Pablo Anzorena commented on SOLR-8768:
--------------------------------------

[~mariusneo]You are right, with the sample I posted the error can't be 
reproduce. 

Now with real  data (the cardinality of seller_name is around 2000) this is the 
response if I ask for the top 3:
{code}
{
  "responseHeader": {
    "status": 0,
    "QTime": 1992,
    "params": {
      "q": "*:*",
      "shards": 
"localhost:8983/solr/sellers_2005,localhost:8983/solr/sellers_2006,localhost:8983/solr/sellers_2007",
      "json.facet": "{\n  top_sellers: {\n    type: terms,\n    field: 
seller_name,\n    limit: 3,\n    offset: 0,\n    sort: \"seller_measure 
desc\",\n    facet: {\n      seller_measure: \"sum(seller_measure)\"\n    }\n  
}\n}",
      "rows": "0",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 94641193,
    "start": 0,
    "maxScore": 1.0,
    "docs": [
      
    ]
  },
  "facets": {
    "count": 94641193,
    "top_sellers": {
      "buckets": [
        {
          "val": "Tyrion",
          "count": 22067,
          "seller_measure": 6.381640740799999E8
        },
        {
          "val": "Jon",
          "count": 9323,
          "seller_measure": 4.376016594200097E8
        },
        {
          "val": "PoorNed",
          "count": 3714,
          "seller_measure": 2.1381292140000007E8
        }
      ]
    }
  }
}
{code}

Now look when I change the query to filter specifically those three 
seller_names:

{code}
{
  "responseHeader": {
    "status": 0,
    "QTime": 26,
    "params": {
      "q": "seller_name:(Tyrion Jon PoorNed)",
      "shards": 
"localhost:8983/solr/sellers_2005,localhost:8983/solr/sellers_2006,localhost:8983/solr/sellers_2007",
      "json.facet": "{\n  top_sellers: {\n    type: terms,\n    field: 
seller_name,\n    limit: 3,\n    offset: 0,\n    sort: \"seller_measure 
desc\",\n    facet: {\n      seller_measure: \"sum(seller_measure)\"\n    }\n  
}\n}",
      "rows": "0",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 37552,
    "start": 0,
    "maxScore": 2.4321828,
    "docs": [
      
    ]
  },
  "facets": {
    "count": 37552,
    "top_sellers": {
      "buckets": [
        {
          "val": "Tyrion",
          "count": 24515,
          "seller_measure": 6.436709089399998E8
        },
        {
          "val": "Jon",
          "count": 9323,
          "seller_measure": 4.376016594200096E8
        },
        {
          "val": "PoorNed",
          "count": 3714,
          "seller_measure": 2.1381292140000007E8
        }
      ]
    }
  }
}
{code}

See the difference in the seller_measure of Tyrion? This happens (I think) 
because making a ranking desc by seller_measure, Tyrion is in the position 1000 
for the shards sellers_2005 and sellers_2006.

If I make the same request with limit 2000, Tyrion appears in the top 3 with 
the correct measure, that is the sum of the three shards.

> Wrong behaviour in json facets
> ------------------------------
>
>                 Key: SOLR-8768
>                 URL: https://issues.apache.org/jira/browse/SOLR-8768
>             Project: Solr
>          Issue Type: Bug
>          Components: Facet Module
>            Reporter: Pablo Anzorena
>
> This bug is quite difficult to explain it, so I will first show it with an 
> example and then explain it.
> I have a core splitted into three shards, let's call them 'sellers_2014', 
> 'sellers_2015', 'sellers_2016'.
> The schema has the following fields:
> seller_name, string
> seller_measure, double
> seller_date, date
> With the following data.
> 'sellers_2014'
> Tyrion, 1, 2014-01-01T00:00:00Z
> Jon, 50, 2014-01-01T00:00:00Z
> PoorNed, 4, 2014-01-01T00:00:00Z
> 'sellers_2015'
> Tyrion, 100, 2015-01-01T00:00:00Z
> Jon, 50, 2015-01-01T00:00:00Z
> PoorNed, 4, 2015-01-01T00:00:00Z
> 'sellers_2016'
> Tyrion, 1, 2015-01-01T00:00:00Z
> Jon, 50, 2015-01-01T00:00:00Z
> PoorNed, 4, 2015-01-01T00:00:00Z
> Request:
> http://localhost:8983/solr/sellers_2016/select?q=*:*&shards=localhost:8983/solr/sellers_2014,localhost:8983/solr/sellers_2015,localhost:8983/solr/sellers_2016&json.facet=
> {code}
> {
>   top_sellers: {
>     type: terms,
>     field: seller_name,
>     limit: 2,
>     offset: 0,
>     sort: "seller_measure desc",
>     facet: {
>       seller_measure: "sum(seller_measure)",
>       seller_dates: {
>         type: range,
>         field: seller_date,
>         start: "2014-01-01T00:00:00Z",
>         end: "2016-12-31T00:00:00Z",
>         gap: "+1YEARS",
>         facet: {
>           seller_measure: "sum(seller_measure)"
>         }
>       }
>     }
>   }
> }
> {code}
> So... With the request I want to know the top 2 sellers across the three 
> shards and for each seller, their seller_measure for each year.
> The response I'm getting is:
> {code}
> "val": "Jon",
> "count": 3,
> "seller_measure": 150,
> "seller_dates": {
>   "buckets": [
>     {
>       "val": "2014-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 50
>     },
>     {
>       "val": "2015-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 50
>     },
>     {
>       "val": "2016-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 50
>     }
>   ]
> },
> "val": "Tyrion",
> "count": 3,
> "seller_measure": 102,
> "seller_dates": {
>   "buckets": [
>     {
>       "val": "2015-01-01T00:00:00Z",
>       "count": 1,
>       "seller_measure": 100
>     }
>   ]
> }
> {code}
> which is incorrect, because the two buckets of 2014 and 2016 in Tyrion are 
> missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to