leerho commented on issue #481:
URL: 
https://github.com/apache/datasketches-java/issues/481#issuecomment-1842232679

   @edmondliuTTD 
   Thank you for your contribution and you interest in our library.  It is 
clear you have studied the KLL code somewhat, however, studying your changes 
for just a few minutes, I can see that you are having to touch several very 
critical areas of the code, which means we will have to study what you are 
attempting to do quite carefully.  I am not discounting your idea, out-of-hand, 
but I do have some more questions for you:
   
   Sketches are probabilistic state machines and in order to fully understand 
their behavior over a wide range of possible inputs we have to perform 
exhaustive testing that stresses the sketch with millions of trials.   
   
   - So my first question is what testing have you done that could begin to 
establish that your changes do not violate the statistical properties of the 
KLL algorithm?  
   
   - Have you tested update speed performance, space performance, and merge 
accuracy and speed performance?
   
   - It appears that you are not taking advantage of the logarithmic properties 
of the levels array.  This means that your performance improvement may not be 
much better than externally feeding the sketch from a loop.  
   
   - It appears that some of your changes are a matter of style and have 
nothing to do with your targeted changes, e.g., moving static imports from the 
top of the set of imports to the bottom.  Or replacing our multiple specific 
static imports with "*" imports, which is not a good practice. This creates 
unnecessary review overload for us, as we have to ask, why did you do that?
   
   - We would like to know more about the specific problem you are trying to 
solve. And is the lack of being able to update multiple identical items at once 
a severe impediment?  Can you give us some examples?  How big is your total 
data set? what fraction of these are duplicates?  How critical is the update 
speed performance?
   
   I'm sure we will have more comments and questions as we dig deeper into this.
   Cheers
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to