Dylan1312 commented on issue #6814: [Discuss] Replacing hyperUnique as 'default' distinct count sketch URL: https://github.com/apache/incubator-druid/issues/6814#issuecomment-482727132 Hi Lee, Thanks for the response! - I've tried various configurations, each of HLL4,6&8 and lgK values of 6&12 for each. - I'm comparing the time to complete timeseries queries over a small number of segments. One using a hyperUnique (Druid's native hll) aggregator on a column of hyperUnique hll sketches, versus various queries each using a single HLLSketchMerge aggregator against a column ingested with HLLSketchBuild. I noticed that the aggregator spends a significant portion of its time passing a bytebuffer to HLLSketch::wrap, deserialization may be the wrong term :). - With ~18.5M sketches and an historical with a single core I see: - An HLL8 sketch queried with lgK 6 takes around 2.5seconds to complete. - A hyperunique sketch takes around 1.6seconds to complete. - I collected what I'm using fairly ad-hoc from a stream of data so I'm not sure. I expect the distribution to be fairly even but this is something I can investigate more. - I haven't specifically looked at accuracy but I'm seeing both give sketches give answers within 0.4% of each other. Best regards, Dylan
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
