leerho edited a comment on issue #6865: Densify swapped hll buffer URL: https://github.com/apache/incubator-druid/pull/6865#issuecomment-462086552 @gianm Looking for guidance here... I have spent the past few days studying the Druid-HLL code and have uncovered at least a half-dozen serious bugs and haven't even started on the merge logic, which from a brief look also has very serious problems. A number of these problems are interconnected, so you can't just fix one at a time. This code needs to be redesigned from scratch. I'm not sure I want to undertake this, but if I were, I would insist on some major changes. The API will have some required changes: The biggest one is removing the ability for users to specify the hash function. Any users that are currently doing that, using a different hash function and have historical stored images may not be able to use their history. I am considering 2 strategies: 1. **Rewrite existing code**: Many current internal methods that are now public will become private or package-private: e.g., no public access to internals such as the overflow registers, getNumNonZeroRegisters, getHeaderBytes, getPayloadBytePosition, setVersion, etc. I may also insist on a merge class rather than a merge method ( fold() ). The storage would be a little larger (from 1031 bytes to perhaps 1080 bytes). And merge performance may be a bit slower. It could still be backward compatible, but old images will still propagate errors into the new design and there is nothing that can be done about that. Users that record their history with the new design will see much better error performance. 2. **API Wrapper to the DS-HLL sketch**. This will require some changes to the DS-HLL code, but may be easier to do and would eliminate a lot of code duplication. Basically I have to allow for the non-seeded hash function, adapt to the big-endian fields used by the Druid-HLL, create a merge function that can read the old Druid-HLL images, etc. The advantage of this is (hopefully) a single code-base for the sketch internals with two different wrappers, one for the old Druid-HLL users, and a new full-function API wrapper for the DS-HLL customers. Whatever, the new design would have to be extensively characterized and tested. Thoughts?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
