leerho commented on issue #6865: Densify swapped hll buffer
URL: https://github.com/apache/incubator-druid/pull/6865#issuecomment-460878071
 
 
   I agree with @gianm.  
   
   > We had an upstream data producer who was sampling data. The sampling 
algorithm seemed to be based on Murmur3_128, or at least a related algorithm 
where the hash collisions were similar. When doing a HLL sketch of the 
dimension values, we were getting really weird results where all the HLL 
buckets would end up with values that were not good sketches of the input data 
(every bucket nibble with a `1` for example).
   
   Yikes!
   
   I'm not sure I understand, but I hope that you are not sampling data *prior* 
to feeding it to a sketch. This will produce potentially horrible errors no 
matter what sketch you use.  It also doesn't matter what hash function was used 
in the sampling either.  Sketches are *streaming algorithms* and rely on being 
fed every item of the stream.  
   
   Nonetheless, these weird results with 1's in every nibble is a catastrophic 
failure of the sketch, I don't care what values were fed to it.  There must be 
something very unusual about your use of the sketch.  Some more detail about 
how you are using and feeding the sketch would be helpful.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to