gianm commented on issue #8194: HllSketch Merge/Build BufferAggregators: Speed up init with prebuilt sketch. URL: https://github.com/apache/incubator-druid/pull/8194#issuecomment-516553469 > Druid (so far) always allocates space for the maximum size sketch and cannot take advantage of the fact that the size of the sketch for small cardinalities can be quite small and then grow. If this is going to remain true, then it might make sense to add this option. Druid is wasting a lot of memory because of the fixed allocation sizes and this has been discussed at length in other threads, which I'm not going to repeat here. Nonetheless, I am likely out-of-date with what the latest thinking within the Druid team is on this topic, so please update me :) The most recent discussion has been happening in here: #8126. I am not sure what form yet resizable aggregators will take but let's assume they will happen! > Clearly our focus has been to optimize the best possible accuracy for a given size. But if you are willing to allow the sketch to start at maximum size this may make sense for you. It could mean a substantial speed up. When implemented, it will be a configuration option, so you could always play with it and change your mind. It definitely sounds like a useful option, especially if the accuracy decline isn't too bad. Maybe some hybrid approach would also be useful — starting at some medium point rather than very small or fully dense? > The downside of eliminating the sparse mode is that the accuracy of the sketch for low cardinalities will be worse. It will be within the error guarantees for a given _k_ and behave like the conventional Flajolet-Martin HLL sketch at low cardinalities. Our HLL sparse mode takes advantage of new estimators that were developed for our CPC sketch which outperforms the accuracy / byte of almost any other HLL sketch out there by a large margin. Any idea how much worse? > We are in the middle of this Apache migration, so implementing this may be some months out, unless we could get some help speeding up the migration :) :) I joined the DataSketches dev list a few days ago, anything I could do to help?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
