[
https://issues.apache.org/jira/browse/SOLR-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gerald Bonfiglio updated SOLR-16267:
------------------------------------
Description:
I’m noticing some unexpected and undesirable behavior when using JSON Facet API
with Stats functions when using nested functions. Below is an example which
hopefully helps illustrate the behavior I’m seeing.
I have a JSON Facet string of the following:
{code:java}
json.facet={
"grp_0": {
"field": "ssnm",
"limit": -1,
"type": "terms",
"mincount": 1,
"refine": true,
"sort": {"index": "asc"},
"facet": {
"avg_TotalCpuUsec": "avg(TotalCpuUsec)",
"avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
"count_TotalCpuUsec": "countvals(TotalCpuUsec)",
"count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
"sum_TotalCpuUsec": "sum(TotalCpuUsec)",
"sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
}
}
}
{code}
And an example of one of the buckets returned is:
{code:java}
"facets":{
"count":32,
"grp_0":{
"buckets":[{
"val":"Activity",
"count":6,
"count_sqrt_TotalCpuUsec":6,
"sum_sqrt_TotalCpuUsec":495.29246931322893,
"count_TotalCpuUsec":4,
"sum_TotalCpuUsec":61464.399999999994,
"avg_TotalCpuUsec":15366.099999999999,
"avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
Notice that there are 6 documents in the bucket, but only 4 of them have the
field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).
My issue is with the calculation of avg(sqrt(TotalCpuUsec)). The calculation
of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4. However, the
value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6. I think
it should have been divided by 4, since only 4 documents have a value for this
field. It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that
don’t have the field, so this 0.0 for the 2 documents is factoring into the avg
calculation, which seems to be reflected by the value of
countvals(sqrt(TotalCpuUsec)), which is 6.
This seems like a bug, but wanted to reach out to see if this is “working as
expected” and if there are some facet attributes that can be set to work around
this.
was:
I’m noticing some unexpected and undesirable behavior when using JSON Facet API
with Stats functions when using nested functions. Below is an example which
hopefully helps illustrate the behavior I’m seeing.
I have a JSON Facet string of the following:
{code:java}
json.facet={
"grp_0": {
"field": "ssnm",
"limit": -1,
"type": "terms",
"mincount": 1,
"refine": true,
"sort": {"index": "asc"},
"facet": {
"avg_TotalCpuUsec": "avg(TotalCpuUsec)",
"avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
"count_TotalCpuUsec": "countvals(TotalCpuUsec)",
"count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
"sum_TotalCpuUsec": "sum(TotalCpuUsec)",
"sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
}
}
}
{code}
And an example of one of the buckets returned is:
{code:java}
"facets":{
"count":32,
"grp_0":{
"buckets":[{
"val":"Activity",
"count":6,
"count_sqrt_TotalCpuUsec":6,
"sum_sqrt_TotalCpuUsec":495.29246931322893,
"count_TotalCpuUsec":4,
"sum_TotalCpuUsec":61464.399999999994,
"avg_TotalCpuUsec":15366.099999999999,
"avg_sqrt_TotalCpuUsec":82.54874488553816},
.
.
.
} ]}}}
{code}
Notice that there are 6 documents in the bucket, but only 4 of them have the
field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).
My issue is with the calculation of avg(sqrt(TotalCpuUsec)). The calculation
of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4. However, the
value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6. I think
it should have been divided by 4, since only 4 documents have a value for this
field. It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that
don’t have the field, so this 0.0 for the 2 documents is factoring into the avg
calculation, which seems to be reflected by the value of
countvals(sqrt(TotalCpuUsec)), which is 6.
This seems like a bug, but wanted to reach out to see if this is “working as
expected” and if there are some facet attributes that can be set to work around
this.
> JSON Facet Stats methods include docs with no field value when using nested
> function
> ------------------------------------------------------------------------------------
>
> Key: SOLR-16267
> URL: https://issues.apache.org/jira/browse/SOLR-16267
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module
> Affects Versions: 8.11.1
> Environment: Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-184-generic x86_64)
> Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
> Reporter: Gerald Bonfiglio
> Priority: Major
> Labels: Facet, JSON
>
> I’m noticing some unexpected and undesirable behavior when using JSON Facet
> API with Stats functions when using nested functions. Below is an example
> which hopefully helps illustrate the behavior I’m seeing.
>
> I have a JSON Facet string of the following:
> {code:java}
> json.facet={
> "grp_0": {
> "field": "ssnm",
> "limit": -1,
> "type": "terms",
> "mincount": 1,
> "refine": true,
> "sort": {"index": "asc"},
> "facet": {
> "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
> "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
> "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
> "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
> "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
> "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
> }
> }
> }
> {code}
>
> And an example of one of the buckets returned is:
> {code:java}
> "facets":{
> "count":32,
> "grp_0":{
> "buckets":[{
> "val":"Activity",
> "count":6,
> "count_sqrt_TotalCpuUsec":6,
> "sum_sqrt_TotalCpuUsec":495.29246931322893,
> "count_TotalCpuUsec":4,
> "sum_TotalCpuUsec":61464.399999999994,
> "avg_TotalCpuUsec":15366.099999999999,
> "avg_sqrt_TotalCpuUsec":82.54874488553816},
> .
> .
> .
> } ]}}}
> {code}
>
> Notice that there are 6 documents in the bucket, but only 4 of them have the
> field “TotalCpuUsec”, which is reflected in value for
> countvals(TotalCpuUsec). My issue is with the calculation of
> avg(sqrt(TotalCpuUsec)). The calculation of avg(TotalCpuUsec) is correct,
> equaling sum(TotalCpuUsec) / 4. However, the value of
> avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6. I think it
> should have been divided by 4, since only 4 documents have a value for this
> field. It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents
> that don’t have the field, so this 0.0 for the 2 documents is factoring into
> the avg calculation, which seems to be reflected by the value of
> countvals(sqrt(TotalCpuUsec)), which is 6.
> This seems like a bug, but wanted to reach out to see if this is “working as
> expected” and if there are some facet attributes that can be set to work
> around this.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]