leerho commented on issue #6743: IncrementalIndex generally overestimates theta 
sketch size
URL: 
https://github.com/apache/incubator-druid/issues/6743#issuecomment-448816707
 
 
   @gianm 
   
   I don't have a deep understanding of the inner workings of the memory 
allocation strategy in Druid, but I should point out that the current model of 
allocating equal sized slots in a Buffer where each slot is the maximum 
possible size of a sketch Is very likely a horrible waste of memory space.  
   
   Sketches, are designed to start very small in their memory requirements and 
then grow gradually as the size of the input stream grows. Unique counting 
sketches (Theta, HLL, CPC) actually reach a maximum size, beyond which they do 
not grow at all.  Quantile sketches can continue to grow but very very slowly 
(sub-logarithmically with the size of the input stream).
   
   For example, a Theta Sketch configured with LgK = 14 (~16K) has a maximum 
size of about 256KB. However, with only one entry it consumes only 16 Bytes.
   
   Our experience in using sketches in large systems is that the size of the 
input streams that feed the sketches tend to follow a power-law distribution.  
Which means given a system total number of sketches of one million, the vast 
majority of sketches will have very few entries, and just a few sketches will 
be full size.   
   
   > Note: If the sketches ingested into Druid are pre-aggregated sectors, for 
example, this power-law distribution in space required for the sketches is not 
as strong, as most of the sketches will already be at max size.  But in this 
case you are dealing with far fewer sketches, so the memory management concern 
may not be as severe.
   
   To illustrate this power-law effect, I have created a toy model of space 
utilization in the attached excel spreadsheet:
   
   
[DruidModel_pwrLaw.xlsx](https://github.com/apache/incubator-druid/files/2696971/DruidModel_pwrLaw.xlsx)
   
   For this model I assumed a power-law distribution of stream sizes from one 
unique to 134M uniques and a configured sketch size LgK = 14.  I adjusted the 
parameters so that the total number of sketches in the system is about one 
million.  
   
   With these assumptions, the actual storage bytes required by all of the 
sketches is about 440MB.  However, allocating the full size of 256KB for every 
sketch would require nearly 35GB.  This results in a space utilization of about 
1.27%!  This is a huge waste of memory! 
   
   ***
   What I would recommend is that if we could work together, we could come up 
with a much more efficient memory management model for sketches in Druid that 
would allow you to recapture most if not all of that wasted space.  This will 
likely require some changes in Druid as well as a change in how sketches use 
and allocate memory.
   
   One idea I have is to allow each sketch to self-allocate the space it needs 
as it grows. The sketch would be responsible for allocating new space and 
deallocating old space (especially for off-heap). 
   What would live in the equal-sized slots of the Druid buffer would be a 
small, fixed sized "wrapper" with a pointer and a few other values, where the 
data structure of the sketch would be in an off-heap space managed by the 
sketch itself.  This means that Druid memory manager would have to allow for 
some amount of off-heap space that it is not controlling.  In the toy model 
above, that would be about 440MB of off-heap space.  The benefit is that Druid 
would get back nearly 35GB of off-heap space for other uses.  It seems like a 
big win to me, but I don't understand what changes would have to happen in 
Druid to make this work. 
   
   We can certainly do the sketch work, but you (or someone deep into Druid) 
would have to make the appropriate changes in Druid.  One important aspect of 
the Druid work would be to provide some simple tools to analyze the actual 
distributions of stream sizes, sketch sizes and space utilizations before and 
after the change. This meta-data would be invaluable feedback for us to 
understand specific Druid customers' situations, and help us tune the system 
for even better performance going forward.
   
   What do you think?
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to