Jimexist opened a new issue #11824: URL: https://github.com/apache/druid/issues/11824
### Motivation The current HyperLogLog implementation is working fine for most of the use cases and is battle tested. However it is not configurable. For example there are 2048 (2**11) buckets which controls the accuracy but it's not configurable. This has been proposed in https://github.com/apache/druid/issues/4617 however seems the current code structure is not easily modified. ### Proposed changes I'd like to propose to add a newer implementation, following [redis's verison](https://github.com/redis/redis/blob/unstable/src/hyperloglog.c#L997), but achieve it in 3 steps: 1. adding a dense only version but still use 2**11 buckets 2. adding a configuration parameter that controls accuracy between 10-16 3. adding a sparse implementation If all went well then we can maybe make the switch This section should include any changes made to user-facing interfaces, for example: - Parameters - JSON query/ingest specs - SQL language - Emitted metrics ### Rationale A discussion of why this particular solution is the best one. One good way to approach this is to discuss other alternative solutions that you considered and decided against. This should also include a discussion of any specific benefits or drawbacks you are aware of. ### Operational impact This section should describe how the proposed changes will impact the operation of existing clusters. It should answer questions such as: - Is anything going to be deprecated or removed by this change? How will we phase out old behavior? - Is there a migration path that cluster operators need to be aware of? - Will there be any effect on the ability to do a rolling upgrade, or to do a rolling _downgrade_ if an operator wants to switch back to a previous version? ### Test plan (optional) An optional discussion of how the proposed changes will be tested. This section should focus on higher level system test strategy and not unit tests (as UTs will be implementation dependent). ### Future work (optional) An optional discussion of things that you believe are out of scope for the particular proposal but would be nice follow-ups. It helps show where a particular change could be leading us. There isn't any commitment that the proposal author will actually work on the items discussed in this section. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
