gianm commented on issue #8194: HllSketch Merge/Build BufferAggregators: Speed 
up init with prebuilt sketch.
URL: https://github.com/apache/incubator-druid/pull/8194#issuecomment-516553469
 
 
   > Druid (so far) always allocates space for the maximum size sketch and 
cannot take advantage of the fact that the size of the sketch for small 
cardinalities can be quite small and then grow. If this is going to remain 
true, then it might make sense to add this option. Druid is wasting a lot of 
memory because of the fixed allocation sizes and this has been discussed at 
length in other threads, which I'm not going to repeat here. Nonetheless, I am 
likely out-of-date with what the latest thinking within the Druid team is on 
this topic, so please update me :)
   
   The most recent discussion has been happening in here: #8126. I am not sure 
what form yet resizable aggregators will take but let's assume they will happen!
   
   > Clearly our focus has been to optimize the best possible accuracy for a 
given size. But if you are willing to allow the sketch to start at maximum size 
this may make sense for you. It could mean a substantial speed up. When 
implemented, it will be a configuration option, so you could always play with 
it and change your mind.
   
   It definitely sounds like a useful option, especially if the accuracy 
decline isn't too bad. Maybe some hybrid approach would also be useful — 
starting at some medium point rather than very small or fully dense?
   
   > The downside of eliminating the sparse mode is that the accuracy of the 
sketch for low cardinalities will be worse. It will be within the error 
guarantees for a given _k_ and behave like the conventional Flajolet-Martin HLL 
sketch at low cardinalities. Our HLL sparse mode takes advantage of new 
estimators that were developed for our CPC sketch which outperforms the 
accuracy / byte of almost any other HLL sketch out there by a large margin.
   
   Any idea how much worse?
   
   > We are in the middle of this Apache migration, so implementing this may be 
some months out, unless we could get some help speeding up the migration :) :)
   
   I joined the DataSketches dev list a few days ago, anything I could do to 
help?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to