I disagree.  The bounds on our current sketches do a pretty good job of
predicting the actual error behavior.  That is why we have gone to the
trouble of allowing the user to choose the size of the confidence interval
and in the case of the theta Sketches we actually track the growth of the
error Contour in the early stages of Estimation mode.   The better we can
specify the error, the more confident the user will be in the result.

For these early characterization tests it is critical to understand the
error behavior over the entire range of ranks.  How we ultimately specify
and document how to properly use this sketch we will do later.

Lee.


On Sun, Sep 13, 2020 at 1:45 PM Jon Malkin <[email protected]> wrote:

> Our existing bounds, especially for the original quantities sketch, don't
> necessarily predict the actual error behavior. They're an upper bound on
> it. Especially right before one of the major cascading compaction rounds,
> our error may be substantially better than the bound.
>
> I also feel like characterization plots going from 0.0-1.0 kind of misses
> the point of the relative error quantiles sketch. It's kind of useful to
> know what it's doing everywhere but if you care about that whole range you
> should be using KLL, not this sketch. If you care about accuracy over the
> whole range and also high precision at the tail, you should be using both
> sketches together. The interesting thing with this sketch would be to focus
> only on error in, say, the outer 5% at most. Maybe even a log scale plot
> approaching the edge of the range.
>
>   jon
>
-- 
>From my cell phone.

Reply via email to