aharbunou-branch opened a new issue, #13987:
URL: https://github.com/apache/druid/issues/13987

   ### Affected Version
   
   0.19.+
   
   ### Description
   
   I'm upgrading Druid from 0.18.1 to 25.0.0.
   We use HLL from druid-datasketches extension. 
   0.18.1 always produces stable result for HLLSketchMerge whereas starting 
from next version (i.e. 0.19.+) result is different from call to call. 
   
   For this test I did the following:
   - I deployed fresh 25.0.0 Druid cluster with just one historical node. 
   - I ingested about 10 hourly segments with test data having just one 
dimension with value `false` and HLL metric lgk=16. (I tried different lgks and 
all of them started to deviate at some point. I tested with lgk=16 as it is 
commonly used right now)
   - I ran groupBy queries against historical bypassing broker (I have same 
results from broker as well).
   - I tried multiple combination of different dimensions/metrics and saw same 
behavior.
   
   Result is always consistent with just one segment ingested. However, at some 
point it starts to deviate i.e. everytime I run groupBy query against this 
datasource I see different values for HLL metrics:
   Request 
   ```
   {
      "aggregations":[
         {
            "fieldName":"sketch_unique_count_16",
            "name":"unique_count_16",
            "type":"HLLSketchMerge",
            "lgK":16,
            "tgtHllType":"HLL_8"
         }
      ],
      "dataSource":"test_datasource",
      "dimensions":[
         "test_flag"
      ],
      "queryType":"groupBy",
      "intervals":"2023-03-25T21:00:00Z/2023-04-01T23:59:59Z",
      "granularity":"all"
   }
   ```
   Responses were
   `15621.84397004316`, `15662.87505956021`, `15600.366289040447`, 
`15635.015943397264` and etc.
   
   Another observation that once I downgraded Druid back to 0.18.1 groupBy 
returned consistent result for datasource ingested by newer version.
   
   Is it a new HLL behavior that is expected starts from 0.19.+? 
   If no, could you please help to find out what can contribute to this 
inconsistency?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to