aharbunou-branch opened a new issue, #13987:
URL: https://github.com/apache/druid/issues/13987
### Affected Version
0.19.+
### Description
I'm upgrading Druid from 0.18.1 to 25.0.0.
We use HLL from druid-datasketches extension.
0.18.1 always produces stable result for HLLSketchMerge whereas starting
from next version (i.e. 0.19.+) result is different from call to call.
For this test I did the following:
- I deployed fresh 25.0.0 Druid cluster with just one historical node.
- I ingested about 10 hourly segments with test data having just one
dimension with value `false` and HLL metric lgk=16. (I tried different lgks and
all of them started to deviate at some point. I tested with lgk=16 as it is
commonly used right now)
- I ran groupBy queries against historical bypassing broker (I have same
results from broker as well).
- I tried multiple combination of different dimensions/metrics and saw same
behavior.
Result is always consistent with just one segment ingested. However, at some
point it starts to deviate i.e. everytime I run groupBy query against this
datasource I see different values for HLL metrics:
Request
```
{
"aggregations":[
{
"fieldName":"sketch_unique_count_16",
"name":"unique_count_16",
"type":"HLLSketchMerge",
"lgK":16,
"tgtHllType":"HLL_8"
}
],
"dataSource":"test_datasource",
"dimensions":[
"test_flag"
],
"queryType":"groupBy",
"intervals":"2023-03-25T21:00:00Z/2023-04-01T23:59:59Z",
"granularity":"all"
}
```
Responses were
`15621.84397004316`, `15662.87505956021`, `15600.366289040447`,
`15635.015943397264` and etc.
Another observation that once I downgraded Druid back to 0.18.1 groupBy
returned consistent result for datasource ingested by newer version.
Is it a new HLL behavior that is expected starts from 0.19.+?
If no, could you please help to find out what can contribute to this
inconsistency?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]