leerho commented on code in PR #58: URL: https://github.com/apache/datasketches-python/pull/58#discussion_r1917473640
########## docs/source/quantiles/index.rst: ########## @@ -10,17 +10,21 @@ in the stream. These sketches may be used to compute approximate histograms, Probability Mass Functions (PMFs), or Cumulative Distribution Functions (CDFs). -The library provides three types of quantiles sketches, each of which has generic items as well as versions -specific to a given numeric type (e.g. integer or floating point values). All three types provide error -bounds on rank estimation with proven probabilistic error distributions. +The library provides four types of quantiles sketches, three of which have generic items as well as versions +specific to a given numeric type (e.g. integer or floating point values). Those three types provide error +bounds on rank estimation with proven probabilistic error distributions. t-digest is a heuristic-based sketch +that works only on numeric data, and while the error properties are not guaranteed, the sketch typically +does a good job with small storage. - * KLL: Provides uniform rank estimation error over the entire range + * KLL: Provides uniform rank estimation error over the entire range. * REQ: Provides relative rank error estimates, which decreases approaching either the high or low end values. + * t-digest: Relative rank error estimates, heuristic-based without guarantees but quite compact with generally very good error properties. Review Comment: ...(add) with large enough data. ########## docs/source/quantiles/tdigest.rst: ########## @@ -0,0 +1,50 @@ +t-digest +-------- + +.. currentmodule:: datasketches + +The implementation in this library is based on the MergingDigest described in +`Computing Extremely Accurate Quantiles Using t-Digests <https://arxiv.org/abs/1902.04023>`_ by Ted Dunning and Otmar Ertl. + +The implementation in this library has a few differences from the reference implementation associated with that paper: + +* Merge does not modify the input +* Derialization similar to other sketches in this library, although reading the reference implementation format is supported + +Unlike all other algorithms in the library, t-digest is empirical and has no mathematical basis for estimating its error +and its results are dependent on the input data. However, for many common data distributions, it can produce excellent results. Review Comment: ...(add) with large enough data. ########## docs/source/quantiles/index.rst: ########## @@ -10,17 +10,21 @@ in the stream. These sketches may be used to compute approximate histograms, Probability Mass Functions (PMFs), or Cumulative Distribution Functions (CDFs). -The library provides three types of quantiles sketches, each of which has generic items as well as versions -specific to a given numeric type (e.g. integer or floating point values). All three types provide error -bounds on rank estimation with proven probabilistic error distributions. +The library provides four types of quantiles sketches, three of which have generic items as well as versions +specific to a given numeric type (e.g. integer or floating point values). Those three types provide error +bounds on rank estimation with proven probabilistic error distributions. t-digest is a heuristic-based sketch +that works only on numeric data, and while the error properties are not guaranteed, the sketch typically +does a good job with small storage. Review Comment: ...(add) and large enough input data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org For additional commands, e-mail: dev-h...@datasketches.apache.org