leerho commented on issue #6865: Densify swapped hll buffer URL: https://github.com/apache/incubator-druid/pull/6865#issuecomment-460878071 I agree with @gianm. > We had an upstream data producer who was sampling data. The sampling algorithm seemed to be based on Murmur3_128, or at least a related algorithm where the hash collisions were similar. When doing a HLL sketch of the dimension values, we were getting really weird results where all the HLL buckets would end up with values that were not good sketches of the input data (every bucket nibble with a `1` for example). Yikes! I'm not sure I understand, but I hope that you are not sampling data *prior* to feeding it to a sketch. This will produce potentially horrible errors no matter what sketch you use. It also doesn't matter what hash function was used in the sampling either. Sketches are *streaming algorithms* and rely on being fed every item of the stream. Nonetheless, these weird results with 1's in every nibble is a catastrophic failure of the sketch, I don't care what values were fed to it. There must be something very unusual about your use of the sketch. Some more detail about how you are using and feeding the sketch would be helpful.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
