Andy

I don't have a published paper yet.  I do have an unpublished paper.  See 
github.com/tdunning/t-digest/docs

The paper does not have rigorous bounds but it does have some good empirical 
comparisons.  Bounds for random ordering should be pretty straightforward. The 
algorithm provides similar performance, however, for perverse ordering so it 
would be nice to have a way to include that.  Repeated values are also tricky 
to put bounds on. Empirically, accuracy is several orders of magnitude better 
than q-digest for extreme quantiles and 1-2 orders better near the median. 

T-digest is already committed to mahout and I have a pull request in to stream 
lib. The algebird guys are looking to reimplement in scala. 

Sent from my iPhone

> On Dec 2, 2013, at 0:35, Andy Twigg <[email protected]> wrote:
> 
> I've not heard of t-digest before. Are there any theoretical
> guarantees on its performance? Can you point me to a published paper?
> 
>> On 2 December 2013 06:55, Ted Dunning (JIRA) <[email protected]> wrote:
>> 
>>     [ 
>> https://issues.apache.org/jira/browse/MAHOUT-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>  ]
>> 
>> Ted Dunning updated MAHOUT-1368:
>> --------------------------------
>> 
>>    Attachment: MAHOUT-1368.patch
>> 
>> 
>> Here is a patch with additional skewed test.
>> 
>>> Convert OnlineSummarizer to use the new TDigest
>>> -----------------------------------------------
>>> 
>>>                Key: MAHOUT-1368
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-1368
>>>            Project: Mahout
>>>         Issue Type: Bug
>>>           Reporter: Ted Dunning
>>>            Fix For: 0.9
>>> 
>>>        Attachments: MAHOUT-1368.patch
>>> 
>>> 
>>> The new TDigest provides better accuracy for quartile estimation as well as 
>>> producing any other quantile you might like.  The current quartile 
>>> estimation of the OnlineSummarizer fails for highly skewed distributions 
>>> and can't really be extended to provide other quantiles.  The TDigest 
>>> handles all of this.
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.1#6144)

Reply via email to