damnMeddlingKid opened a new issue #10073:
URL: https://github.com/apache/druid/issues/10073
### Affected Version
Tested in version 0.18.1
### Description
While investigating the correctness of druid's null handling by setting
`druid.generic.useDefaultValueForNull=false` I noticed that the `avg` aggregate
is not correctly ignoring nulls in its count. The returned average is the same
as if the nulls were filled with 0.
### Steps to reproduce
1) set `druid.generic.useDefaultValueForNull=false` in
common.runtime.properties, i used the the micro quick start to test this
(./bin/start-micro-quickstart)
2) Load the example data from the ingestion spec below, this is an inline
datasource. the data looks like this:
```
{"id":1,"duration":null}
{"id":2,"duration":null}
{"id":3,"duration":98}
{"id":4,"duration":6}
{"id":5,"duration":73}
{"id":6,"duration":1}
{"id":7,"duration":56}
{"id":8,"duration":null}
{"id":9,"duration":null}
{"id":10,"duration":null}
```
3) run the following query
```
select duration, duration is null as "duration_is_null" from just_duration
```
**Actual result :**
```
{"duration":null,"duration_is_null":null}
{"duration":null,"duration_is_null":null}
{"duration":98,"duration_is_null":false}
{"duration":6,"duration_is_null":false}
{"duration":73,"duration_is_null":false}
{"duration":1,"duration_is_null":false}
{"duration":56,"duration_is_null":false}
{"duration":null,"duration_is_null":null}
{"duration":null,"duration_is_null":null}
{"duration":null,"duration_is_null":null}
```
**Expected result:** `duration_is_null` should be `true` when `duration` is
null.
### Example ingestion spec
```
{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "inline",
"data":
"{\"id\":1,\"duration\":null}\n{\"id\":2,\"duration\":null}\n{\"id\":3,\"duration\":98}\n{\"id\":4,\"duration\":6}\n{\"id\":5,\"duration\":73}\n{\"id\":6,\"duration\":1}\n{\"id\":7,\"duration\":56}\n{\"id\":8,\"duration\":null}\n{\"id\":9,\"duration\":null}\n{\"id\":10,\"duration\":null}"
},
"inputFormat": {
"type": "json"
}
},
"tuningConfig": {
"type": "index_parallel",
"partitionsSpec": {
"type": "dynamic"
}
},
"dataSchema": {
"dataSource": "just_duration",
"granularitySpec": {
"type": "uniform",
"queryGranularity": "NONE",
"rollup": false,
"segmentGranularity": "YEAR"
},
"timestampSpec": {
"column": "id",
"format": "posix"
},
"dimensionsSpec": {
"dimensions": [
{
"type": "long",
"name": "duration"
}
]
}
}
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]