leerho commented on issue #12261: URL: https://github.com/apache/druid/issues/12261#issuecomment-1040739418
@kfaraz @AlexanderSaydakov @gianm @cheddar 1. **PLEASE, engage with us and help us understand what it is you are trying to do. If it means adding a public API to accomplish it, we are more than willing to do that.** 2. Whatever you do, never access internal private methods of these sketches. 3. We placed a detailed, and explicit warning Javadoc on the private gadget member precisely because we were worried that someone might try to do what you did. [Warning Comment](https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/theta/UnionImpl.java#L59-L67) I don't know how we could have been more clear. The memory currently being used by a Union or Sketch is rather complex: - Do you mean on-heap or off-heap memory, or both? - The largest chunk of memory is the array of hashes which is configured as a hashTable, where the fraction of that hashTable space consumed by valid hashes can vary enormously based on the input stream, and the exact state of the union state machine. --- So, are you seeking to understand the size of the total space consumed by the sketch, regardless of what fraction is actively being used at that moment? Unfortunately, this is dynamic, non-deterministic variable. And if your application makes use of reset(), it will reset the space used down to a minimum. - How do you intend to use this information? Are you trying to forecast memory requirements in the future, based on some aggregate statistic across many sketches? We have done quite a bit of work on modeling aggregate sketch memory usage where thousands or millions of sketches are concurrent in memory. Perhaps this may be useful to you. We like to use Druid's use of DataSketches as an exemplary model and we point folks to your website and your code base so that they can understand how DataSketches can be integrated properly. We certainly don't want other platforms mimicking your accessing sketch internal private fields or methods. No one understands the internals of these sketches better than our DataSketches team, and we are very much interested in making sketches work optimally in the Druid environment. But we cannot help you if you don't engage with us. In conclusion, it is my strong recommendation that you revert this commit (before it gets frozen in a release), and work with us to help you find a better, more public solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
