damnMeddlingKid opened a new issue #10073:
URL: https://github.com/apache/druid/issues/10073


   ### Affected Version
   
   Tested in version 0.18.1
   
   ### Description
   
   While investigating the correctness of druid's null handling by setting 
`druid.generic.useDefaultValueForNull=false` I noticed that the `avg` aggregate 
is not correctly ignoring nulls in its count. The returned average is the same 
as if the nulls were filled with 0. 
   
   ### Steps to reproduce
   
   1) set `druid.generic.useDefaultValueForNull=false` in 
common.runtime.properties, i used the the micro quick start to test this 
(./bin/start-micro-quickstart)
   
   2) Load the example data from the ingestion spec below, this is an inline 
datasource. the data looks like this:
   
   ```
   {"id":1,"duration":null}
   {"id":2,"duration":null}
   {"id":3,"duration":98}
   {"id":4,"duration":6}
   {"id":5,"duration":73}
   {"id":6,"duration":1}
   {"id":7,"duration":56}
   {"id":8,"duration":null}
   {"id":9,"duration":null}
   {"id":10,"duration":null}
   ```
   
   3) run the following query 
   
   ```
   select duration, duration is null as "duration_is_null" from just_duration
   ```
   
   **Actual result :**
   
   ```
   {"duration":null,"duration_is_null":null}
   {"duration":null,"duration_is_null":null}
   {"duration":98,"duration_is_null":false}
   {"duration":6,"duration_is_null":false}
   {"duration":73,"duration_is_null":false}
   {"duration":1,"duration_is_null":false}
   {"duration":56,"duration_is_null":false}
   {"duration":null,"duration_is_null":null}
   {"duration":null,"duration_is_null":null}
   {"duration":null,"duration_is_null":null}
   ```
   
   **Expected result:** `duration_is_null` should be `true` when `duration` is 
null.
   
   ### Example ingestion spec
   
   ```
   {
     "type": "index_parallel",
     "spec": {
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "inline",
           "data": 
"{\"id\":1,\"duration\":null}\n{\"id\":2,\"duration\":null}\n{\"id\":3,\"duration\":98}\n{\"id\":4,\"duration\":6}\n{\"id\":5,\"duration\":73}\n{\"id\":6,\"duration\":1}\n{\"id\":7,\"duration\":56}\n{\"id\":8,\"duration\":null}\n{\"id\":9,\"duration\":null}\n{\"id\":10,\"duration\":null}"
         },
         "inputFormat": {
           "type": "json"
         }
       },
       "tuningConfig": {
         "type": "index_parallel",
         "partitionsSpec": {
           "type": "dynamic"
         }
       },
       "dataSchema": {
         "dataSource": "just_duration",
         "granularitySpec": {
           "type": "uniform",
           "queryGranularity": "NONE",
           "rollup": false,
           "segmentGranularity": "YEAR"
         },
         "timestampSpec": {
           "column": "id",
           "format": "posix"
         },
         "dimensionsSpec": {
           "dimensions": [
             {
               "type": "long",
               "name": "duration"
             }
           ]
         }
       }
     }
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to