leerho commented on issue #6865: Densify swapped hll buffer URL: https://github.com/apache/incubator-druid/pull/6865#issuecomment-462086552 @gianm Looking for guidance here... I have spent the past few days studying the Druid-HLL code and have uncovered at least a half-dozen serious bugs and haven't even started on the merge logic, which from a brief look also has very serious problems. A number of these problems are interconnected, so you can't just fix one at a time. This code needs to be redesigned from scratch. I'm not sure I want to undertake this, but if I were, I would insist on some major changes. The API will have some required changes: The biggest one is removing the ability for users to specify the hash function. Any users that are currently doing that, using a different hash function and have historical stored images may not be able to use their history. Many current internal methods that are now public will become private or package-private: e.g., access to internals such as the overflow registers, getNumNonZeroRegisters, getHeaderBytes, getPayloadBytePosition, setVersion, etc. I may also insist on a merge class rather than a merge method ( fold() ). The storage would be a little larger (from 1031 bytes to perhaps 1046 bytes). And merge performance may be a bit slower. It could still be backward compatible, but old images will still propagate errors into the new design and there is nothing that can be done about that. Users that record their history with the new design will see much better error performance. This new design would have to be extensively characterized and tested. Even with this redesign, I would insist that it still be deprecated in favor of the DS-HLL sketch, which will have even better accuracy for the same size, and more flexible (different values of k), merge across different k's, etc. Thoughts?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
