gianm commented on issue #6743: IncrementalIndex generally overestimates theta 
sketch size
URL: 
https://github.com/apache/incubator-druid/issues/6743#issuecomment-449146550
 
 
   > I don't have a deep understanding of the inner workings of the memory 
allocation strategy in Druid, but I should point out that the current model of 
allocating equal sized slots in a Buffer where each slot is the maximum 
possible size of a sketch Is very likely a horrible waste of memory space.
   
   Yeah, it's not ideal, especially not for some of the newer sketches. (The 
granddaddy of Druid sketches, hyperUnique, doesn't have a very high max memory 
footprint, so it was less of an issue back then.)
   
   > What I would recommend is that if we could work together, we could come up 
with a much more efficient memory management model for sketches in Druid that 
would allow you to recapture most if not all of that wasted space. This will 
likely require some changes in Druid as well as a change in how sketches use 
and allocate memory.
   
   That would be awesome.
   
   Rather than the Druid memory manager allowing for off-heap space that it 
doesn't control, what if Druid _did_ control it, but basically delegated that 
control to the sketch? Druid's query engine works by allocating a fixed-size 
"processing buffer" to each compute thread. When a thread processes a segment, 
it allocates all the memory it needs out of that buffer. After the segment is 
done being processed, the results are transferred elsewhere, and the processing 
buffer is reused for the next segment to be processed. The processing buffers 
are typically 500MB to 2GB in size, and they are preallocated at server startup 
to avoid "surprises" (one buffer per compute thread, which is a fixed-size pool 
generally set to the number of processors).
   
   Right now, as I'm guessing you know, the protocol for aggregators getting 
space in that buffer is something like:
   
   1. Druid calls AggregatorFactory's getMaxIntermediateSize method to figure 
out how much memory to allocate per aggregator.
   2. Druid allocates that much memory per grouping tuple.
   3. Druid calls BufferAggregator's "init", "aggregate", and "get" methods to 
interact with the memory it has allocated.
   
   Riffing off your idea, what I'm thinking is carving out a chunk of the 
processing buffer to be managed by the BufferAggregator impl:
   
   1. Druid calls a new AggregatorFactory "getTypicalIntermediateSize" method 
to figure out a "typical" size per aggregator, and getMaxIntermediateSize to 
figure out the max.
   2. Druid computes how many grouping tuples it could store in the buffer if 
each one had aggregators of the "typical" size. Call it N.
   3. Druid carves out an arena of size N * getTypicalIntermediateSize from the 
processing buffer, and passes it to the AggregatorFactory's "factorizeBuffered" 
method, which creates a BufferAggregator that is free to use that arena.
   4. For each grouping tuple, Druid stores not the aggregated value, but just 
the information needed to find the actual data in the arena.
   
   We could also tweak this protocol to avoid the arena if "typical" equals 
"max", so primitive aggregators don't need to become more complex, and so we 
save the extra overhead for the pointer into the arena.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to