Gerald Bonfiglio created SOLR-16267:
---------------------------------------

             Summary: JSON Facet Stats methods include docs with no field value 
when using nested function
                 Key: SOLR-16267
                 URL: https://issues.apache.org/jira/browse/SOLR-16267
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Facet Module
    Affects Versions: 8.11.1
         Environment: Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-184-generic x86_64)

Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
            Reporter: Gerald Bonfiglio


I’m noticing some unexpected and undesirable behavior when using JSON Facet API 
with Stats functions when using nested functions.  Below is an example which 
hopefully helps illustrate the behavior I’m seeing.

I have a JSON Facet string of the following:

 
{code:java}
json.facet={
   "grp_0": {
      "field": "ssnm",
      "limit": -1,
      "type": "terms",
      "mincount": 1,
      "refine": true,
      "sort": {"index": "asc"},
      "facet": {
         "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
        "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
         "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
         "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
         "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
         "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
      }
   }
}
{code}
And an example of one of the buckets returned is:

 
{code:java}
  "facets":{
    "count":32,
    "grp_0":{
      "buckets":[{
          "val":"Activity",
          "count":6,
          "count_sqrt_TotalCpuUsec":6,
          "sum_sqrt_TotalCpuUsec":495.29246931322893,
          "count_TotalCpuUsec":4,
          "sum_TotalCpuUsec":61464.399999999994,
          "avg_TotalCpuUsec":15366.099999999999,
          "avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
 

Notice that there are 6 documents in the bucket, but only 4 of them have the 
field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).  
My issue is with the calculation of avg(sqrt(TotalCpuUsec)).  The calculation 
of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4.  However, the 
value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6.  I think 
it should have been divided by 4, since only 4 documents have a value for this 
field.  It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that 
don’t have the field, so this 0.0 for the 2 documents is factoring into the avg 
calculation, which seems to be reflected by the value of 
countvals(sqrt(TotalCpuUsec)), which is 6.

This seems like a bug, but wanted to reach out to see if this is “working as 
expected” and if there are some facet attributes that can be set to work around 
this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to