> I've not heard of t-digest before Because it's a new Ted's baby, perhaps? ;) See https://github.com/tdunning/t-digest
The repository contains a tex source and PDF: https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf Dawid On Mon, Dec 2, 2013 at 9:35 AM, Andy Twigg <[email protected]> wrote: > I've not heard of t-digest before. Are there any theoretical > guarantees on its performance? Can you point me to a published paper? > > On 2 December 2013 06:55, Ted Dunning (JIRA) <[email protected]> wrote: >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> ] >> >> Ted Dunning updated MAHOUT-1368: >> -------------------------------- >> >> Attachment: MAHOUT-1368.patch >> >> >> Here is a patch with additional skewed test. >> >>> Convert OnlineSummarizer to use the new TDigest >>> ----------------------------------------------- >>> >>> Key: MAHOUT-1368 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-1368 >>> Project: Mahout >>> Issue Type: Bug >>> Reporter: Ted Dunning >>> Fix For: 0.9 >>> >>> Attachments: MAHOUT-1368.patch >>> >>> >>> The new TDigest provides better accuracy for quartile estimation as well as >>> producing any other quantile you might like. The current quartile >>> estimation of the OnlineSummarizer fails for highly skewed distributions >>> and can't really be extended to provide other quantiles. The TDigest >>> handles all of this. >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.1#6144) >
