c-dickens commented on PR #71: URL: https://github.com/apache/datasketches-rust/pull/71#issuecomment-3770448423
@leerho - I have enquired with Justin to check my understanding. For others, my feeling is that halving (or scaling by some factor in (0, 1] ) is an acceptable operation. However, the sketch array has signed values, so if sketches were subtracted from one another and a negative value became present, then this would violate the error bounds. The standard (and well-known) error bounds for count min frequency estimation have a key assumption that all entries in the underlying frequency vector are non-negative so a negative value in `sketch(A) - sketch(B)` would contradict this assumption and thus the error bounds would not be valid. Although the scaling itself is not a problem, it would be good to learn more about the intended application to decide the best path forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
