tdunning commented on issue #409: URL: https://github.com/apache/datasketches-cpp/issues/409#issuecomment-1852308728
There are two questions here. One is about the case of updates with weight >1 and the other is about stable sorting. The first issue is inherent in adding samples with greater than unit weight. Unless the digest can split that sample into multiple updates, the invariant can easily be violated. In the test in question, a single sample at 1 is given with weight two. Inserting this as is results in a single centroid with weight 2 which violates the invariant which says that all samples must have proper scaled weight or have weight 1. If we were allowed to split this into two samples at the same point, the invariant can be preserved. The problem with that is that we don't really know if this is two samples at the same value or two samples with that mean value. That might be resolved by documenting that it should not be used to enter mean values. Regarding the stable sort, this is also required to avoid violating the invariant. Take the case where you have a digest generated from a bunch of samples at the same point. These samples will be gathered into centroids as the invariant allows, but all centroids created that way will have the same mean value. If you add a bunch more samples and (stably) sort the new samples into the old centroids things will be fine if the order of the old samples is preserved since you can add unit weight centroids anywhere you like without changing the invariant. On the other hand, if you use an unstable sort you can move the big centroids away from the center of the distribution and thus violate the invariant. These sorts of things are the biggest reason for having so many tests that focus on repeated values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
