tdunning commented on issue #409:
URL: 
https://github.com/apache/datasketches-cpp/issues/409#issuecomment-1853215209

   The need for stability comes from the need to preserve the t-digest
   invariant which is defined by a scale function.
   
   The scale function is a function that maps q (the quantile) to k (a scaled
   version. Scale functions look like kind of like this:
   
   [image: image.png]
   The t-digest invariant says that the k-size of any centroid must be less
   than one except if the weight of the centroid is 1.
   
   The k=size of a centroid is k(q_right) - k(q_left) where q_left is the
   value of q on the left side of the centroid and q_right is the value on the
   right. Because the scale function is steep near q=0 and q=1, the k-size of
   a cluster with constant weight is bigger if it is near the ends rather than
   near the middle. Defining the bound in terms of the scale function is what
   constrains the centroid sizes so that interpolation near these extremes
   will give good results. The invariant also bounds the size of the t-digest
   for suitable scale functions (see https://arxiv.org/pdf/1903.09921.pdf for
   details).
   
   This is all very simple when all of the samples that we use to create the
   t-digest are distinct because that means all of the centroids will have
   different means and sorting will work predictably.
   
   But what if we have 1000 samples all at the same point? The sizes of the
   centroids in the resulting t-digest near the center will be bigger than the
   ones at the edge. That's good. But if we sort those centroids purely by
   their means (which are all the same) with an unstable sort, they could get
   all jumbled. That could put the biggest centroids near the edge where their
   k-size would be much larger than it was near the center.
   
   Does this make sense?
   
   
   
   On Tue, Dec 12, 2023 at 10:42 AM Alexander Saydakov <
   ***@***.***> wrote:
   
   > As I understand, we should not even provide a public add() method with
   > weight.
   > Of course I am going to call std::sort or std::stable_sort. I am afraid I
   > do not quite get the importance of stable sort and what exactly the
   > invariant is.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/datasketches-cpp/issues/409#issuecomment-1852608805>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAB5E6RBVBVGDRBCSZ6WVFDYJCQR7AVCNFSM6AAAAABANDLXNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJSGYYDQOBQGU>
   > .
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to