> I've not heard of t-digest before

Because it's a new Ted's baby, perhaps? ;) See
https://github.com/tdunning/t-digest

The repository contains a tex source and PDF:

https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf

Dawid

On Mon, Dec 2, 2013 at 9:35 AM, Andy Twigg <[email protected]> wrote:
> I've not heard of t-digest before. Are there any theoretical
> guarantees on its performance? Can you point me to a published paper?
>
> On 2 December 2013 06:55, Ted Dunning (JIRA) <[email protected]> wrote:
>>
>>      [ 
>> https://issues.apache.org/jira/browse/MAHOUT-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>  ]
>>
>> Ted Dunning updated MAHOUT-1368:
>> --------------------------------
>>
>>     Attachment: MAHOUT-1368.patch
>>
>>
>> Here is a patch with additional skewed test.
>>
>>> Convert OnlineSummarizer to use the new TDigest
>>> -----------------------------------------------
>>>
>>>                 Key: MAHOUT-1368
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1368
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>            Reporter: Ted Dunning
>>>             Fix For: 0.9
>>>
>>>         Attachments: MAHOUT-1368.patch
>>>
>>>
>>> The new TDigest provides better accuracy for quartile estimation as well as 
>>> producing any other quantile you might like.  The current quartile 
>>> estimation of the OnlineSummarizer fails for highly skewed distributions 
>>> and can't really be extended to provide other quantiles.  The TDigest 
>>> handles all of this.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.1#6144)
>

Reply via email to