tdunning commented on issue #409:
URL: 
https://github.com/apache/datasketches-cpp/issues/409#issuecomment-1852308728

   There are two questions here. One is about the case of updates with weight 
>1 and the other is about stable sorting.
   
   The first issue is inherent in adding samples with greater than unit weight. 
Unless the digest can split that sample into multiple updates, the invariant 
can easily be violated. In the test in question, a single sample at 1 is given 
with weight two. Inserting this as is results in a single centroid with weight 
2 which violates the invariant which says that all samples must have proper 
scaled weight or have weight 1. If we were allowed to split this into two 
samples at the same point, the invariant can be preserved. The problem with 
that is that we don't really know if this is two samples at the same value or 
two samples with that mean value. That might be resolved by documenting that it 
should not be used to enter mean values.
   
   Regarding the stable sort, this is also required to avoid violating the 
invariant. Take the case where you have a digest generated from a bunch of 
samples at the same point. These samples will be gathered into centroids as the 
invariant allows, but all centroids created that way will have the same mean 
value. If you add a bunch more samples and (stably) sort the new samples into 
the old centroids things will be fine if the order of the old samples is 
preserved since you can add unit weight centroids anywhere you like without 
changing the invariant. On the other hand, if you use an unstable sort you can 
move the big centroids away from the center of the distribution and thus 
violate the invariant.
   
   These sorts of things are the biggest reason for having so many tests that 
focus on repeated values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to