leerho commented on issue #12261:
URL: https://github.com/apache/druid/issues/12261#issuecomment-1040739418


   @kfaraz @AlexanderSaydakov @gianm @cheddar 
   1.  **PLEASE, engage with us and help us understand what it is you are 
trying to do. If it means adding a public API to accomplish it, we are more 
than willing to do that.**   
   2.  Whatever you do, never access internal private methods of these 
sketches.  
   3. We placed a detailed, and explicit warning Javadoc on the private gadget 
member precisely because we were worried that someone might try to do what you 
did.  
   [Warning 
Comment](https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/theta/UnionImpl.java#L59-L67)
   I don't know how we could have been more clear. 
   
   The memory currently being used by a Union or Sketch is rather complex:   
   
   - Do you mean on-heap or off-heap memory, or both?  
   - The largest chunk of memory is the array of hashes which is configured as 
a hashTable, where the fraction of that hashTable space consumed by valid 
hashes can vary enormously based on the input stream, and the exact state of 
the union state machine.  --- So, are you seeking to understand the size of the 
total space consumed by the sketch, regardless of what fraction is actively 
being used at that moment?  Unfortunately, this is dynamic, non-deterministic 
variable.   And if your application makes use of reset(), it will reset the 
space used down to a minimum.  
   - How do you intend to use this information?  Are you trying to forecast 
memory requirements in the future, based on some aggregate statistic across 
many sketches?  
   
   We have done quite a bit of work on modeling aggregate sketch memory usage 
where thousands or millions of sketches are concurrent in memory.  Perhaps this 
may be useful to you.  
   
   We like to use Druid's use of DataSketches as an exemplary model and we 
point folks to your website and your code base so that they can understand how 
DataSketches can be integrated properly.  We certainly don't want other 
platforms mimicking your accessing sketch internal private fields or methods.
   
   No one understands the internals of these sketches better than our 
DataSketches team, and we are very much interested in making sketches work 
optimally in the Druid environment.   But we cannot help you if you don't 
engage with us.
   
   In conclusion, it is my strong recommendation that you revert this commit 
(before it gets frozen in a release), and work with us to help you find a 
better, more public solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to