proost opened a new issue, #96: URL: https://github.com/apache/datasketches-go/issues/96
# Summary Add Go version t-digest. # Design ## Double Adding Double Precision t-digest. Because Go doesn’t support meta programming in C++, pairing (float32, uint32) and (float64, uint64) is difficult. In this time, i will add double precision only. ```go // Double provides an implementation of t-Digest double precision type for estimating quantiles and ranks. // This implementation is based on the paper: // Ted Dunning, Otmar Ertl. "Extremely Accurate Quantiles Using t-Digests" // and the reference implementation: https://github.com/tdunning/t-digest // NOTE: This implementation is similar to MergingDigest in the above Java implementation type Double struct { min float64 max float64 centroids []doublePrecisionCentroid buffer []float64 centroidsWeight uint64 centroidsCapacity int k uint16 reverseMerge bool } type doublePrecisionCentroid struct { mean float64 weight uint64 } func (c *doublePrecisionCentroid) add(other doublePrecisionCentroid) { c.weight += other.weight c.mean += (other.mean - c.mean) * float64(other.weight) / float64(c.weight) } // Update updates a value to the TDigest func (d *Double) Update(value float64) error // Merge merges another Double into this one func (d *Double) Merge(other *Double) error // IsEmpty returns true if the TDigest has not seen any data func (d *Double) IsEmpty() bool // MinValue returns the minimum value seen by the TDigest func (d *Double) MinValue() (float64, error) // MaxValue returns the maximum value seen by the TDigest func (d *Double) MaxValue() (float64, error) // TotalWeight returns the total weight of all values func (d *Double) TotalWeight() uint64 // K returns the compression parameter k func (d *Double) K() uint16 // Rank computes the approximate normalized rank of the given value func (d *Double) Rank(value float64) (float64, error) // Quantile computes the approximate quantile value corresponding to the given normalized rank func (d *Double) Quantile(rank float64) (float64, error) // PMF returns an approximation to the Probability Mass Function (PMF) // of the input stream. func (d *Double) PMF(splitPoints []float64) ([]float64, error) // CDF returns an approximation to the Cumulative Distribution Function (CDF) // which is the cumulative analog of the PMF of the input stream. func (d *Double) CDF(splitPoints []float64) ([]float64, error) // String returns a human-readable summary of the TDigest func (d *Double) String(shouldPrintCentroids bool) string // SerializedSizeBytes computes the serialized size in bytes of the TDigest. func (d *Double) SerializedSizeBytes(withBuffer bool) int ``` ## Serialization/Deserialization - I will use Encoder/Decoder pattern like it was before. # Release Schedule I sent a [PR](https://github.com/apache/datasketches-cpp/pull/471) and discussion opens. When finish the PR, I will start upload implementation. I will upload 2 PRs. 1. A PR for double precision t-digest. 2. A PR for and serialization / deserialization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
