Re: Java code for Relative Error Quantiles Sketch

Vesely, Pavel Mon, 14 Sep 2020 07:17:18 -0700

Well, in the graph you sent, the lines of LB and UB are basically bounds on 
r[i] - tr[i], whereas the flat curves are relative to the rank, i.e., (r[i] - 
tr[i]) / tr[i]. Then it's actually no surprise that their behaviors don't match.


As for the case of small K: I'm aware that the getMaxRSE function returns a 
value bigger than 1 in that case. This is because the formula for std. 
deviation is taken from the paper and has a constant factor slack, especially 
for a small K. I'd suggest to calculate MaxRSE based on some experiments, say, 
for K <= 10 (not sure if we can provide a better math formula for that case, 
but I'll think about it).
Cheers,
Pavel

On 2020-09-13 22:19, leerho wrote:
Still, the calculated a priori error bounds for the LRA case still look strange.


On Sun, Sep 13, 2020 at 1:09 PM leerho 
<[email protected]<mailto:[email protected]>> wrote:
Pavel,
Yes, I was doing something wrong! Sorry! I was confusing the predicted bounds 
on a rank vs the error shown in the graphs where the absolute rank is 
subtracted out just showing the difference.

Nonetheless, we do need functions that do not go negative for small K.

Lee.

On Sun, Sep 13, 2020 at 12:50 PM leerho 
<[email protected]<mailto:[email protected]>> wrote:
Pavel,
I have looked very carefully at your p-code for getRankUB and getRankLB, and 
they both produce nonsensical values.

  *   The getMaximumRSE(k), with K=4 (the smallest value of K) multiplied by 3 
SD = 1.23, which is > 1.  It should never produce values > 1 for all possible 
legal values of K. This will cause the LB to go negative.
  *   Given a K and SD, the function (1 +/- SD * RSE) is a constant.  This 
multiplied by the rank will produce a linear increasing bounds from the lowest 
rank to the highest.  And this does not model the error properties for the HRA 
case which has decreasing error for the high ranks or the LRA case where the 
error is pretty flat for most of the range.
  *   We need equations that not only predict the actual error behavior, but 
also for the lower bound, never produces values < 0 (for all legal values of 
K), which these equations will.

Comments:

  *   This is not a bug, but a user design principle.  I don't think it is a 
good idea to have the user specify a value to obtain the rank error.  We don't 
want to give the user the idea that the error bounds are related to values at 
all, only ranks.  I realize we can translate the given value to an estimated 
rank, but let's have the user do that and document to the user the caveats 
about doing that.
  *   We have been training our users to become accustomed to specifying and 
obtaining ranks as normalized [0,1]. That is how our other quantile sketches 
work.  Most users think in terms of percentiles anyway so normalized ranks are 
pretty natural.

Perhaps I am still doing something wrong?

Cheers,

Lee.

On Sat, Sep 12, 2020 at 11:26 AM leerho 
<[email protected]<mailto:[email protected]>> wrote:
Hi Pavel,
I think we are in agreement.  I take it as a positive that you haven't found 
any flaw in the implementation of the relative error quantiles algorithm.   
What we are discussing now is definitions in how to define rank especially with 
respect to HRA and LRA, and philosophically, what kind of error distribution as 
a function of rank will users want and be easiest to specify and explain.

My intuition is that all plots should look roughly the same, up to mirroring
along the Y axis.

This statement is a little puzzling as it will only be true if we choose 
definitions of rank appropriately if the user selects LRA or HRA.  As my plots 
reveal, if we keep the definition of rank the same, switching  from HRA to LRA 
has dramatically different error distribution as a  f(rank).  I agree with your 
formulas for relative error, except that all of our ranks are already 
normalized by SL, so I would replace SL in your formulas by 1.0.

I still want to add to these plots your a priori calculation of error to see 
where it lies compared to the measured error.

I gather from your comments that what you had in mind when writing the paper 
was a rank error distribution that looks like plots 3 and 4 above and not plots 
1 and 2.  However, I feel that the rank error distribution shown in plots 1 and 
2 would be easier for users to understand.  We will probably leave the decision 
totally up to the user as to what they prefer, however, this will require more 
documentation to explain all the different error behaviors  :( .

Lee.

Re: Java code for Relative Error Quantiles Sketch

Reply via email to