himanshug commented on issue #6743: IncrementalIndex generally overestimates theta sketch size URL: https://github.com/apache/incubator-druid/issues/6743#issuecomment-486450735 @leerho It appears that postgres does have a memory allocator in order to provide the "palloc" and "pfree" methods . @gianm was suggesting something similar. In that case DS library would allow some way of passing those functions . Druid(or other users of DS) would implement the memory allocator in the way that makes most sense for them (e.g. allocating a big chunk of memory at startup and then giving off chunks from this in "palloc" or delegate each "palloc" to underlying jvm heap or os ...) I looked into this a long time ago and one way was hacking it was to use "MemoryRegion" and "MemoryRequest" as in https://github.com/himanshug/druid/blob/growable_aggregator_final/extensions/datasketches/src/main/java/io/druid/query/aggregation/datasketches/theta/SketchResizableBufferAggregator.java#L120 (as you might guess this is based on pretty old version of DS library :) ) . @gianm for IncrementalIndex , if above is done, simplest would be to use BufferAggregator and it would be more accurate as well than trying to do sizeOf(aggregator) . Current implementation to spill based on `getMaxIntermediateSize()` is puzzling to me as the number returned there is totally unrelated to what smallest/current/largest heap utilization of on-heap Aggregator would be. That number is only relevant when BufferAggregator is used.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
