Dylan1312 commented on issue #6814: [Discuss] Replacing hyperUnique as 
'default' distinct count sketch
URL: 
https://github.com/apache/incubator-druid/issues/6814#issuecomment-482727132
 
 
   Hi Lee,
   
   Thanks for the response!
   
   - I've tried various configurations, each of HLL4,6&8 and lgK values of 6&12 
for each.
   
   - I'm comparing the time to complete timeseries queries over a small number 
of segments.
   
     One using a hyperUnique (Druid's native hll) aggregator on a column of 
hyperUnique hll sketches, versus various queries each
     using a single HLLSketchMerge aggregator against a column ingested with 
HLLSketchBuild.
   
     I noticed that the aggregator spends a significant portion of its time 
passing a bytebuffer to HLLSketch::wrap, deserialization may be the wrong term 
:).
   
   - With ~18.5M sketches and an historical with a single core I see:
           - An HLL8 sketch queried with lgK 6 takes around 2.5seconds to 
complete.
           - A hyperunique sketch takes around 1.6seconds to complete.
   
   - I collected what I'm using fairly ad-hoc from a stream of data so I'm not 
sure. I expect the distribution to be fairly
     even but this is something I can investigate more.
   
   - I haven't specifically looked at accuracy but I'm seeing both give 
sketches give answers within 0.4% of each other.
   
   Best regards,
   Dylan

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to