[ 
https://issues.apache.org/jira/browse/MAHOUT-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828500#comment-13828500
 ] 

Ted Dunning commented on MAHOUT-1361:
-------------------------------------

Otis,

I should put merging into the paper.  Omitting it was an oversight.

Yes.  It is quite possible.  The basic idea is that you concatenate the 
sub-digests and compact the result exactly as you do in the sequential points 
case where the number of centroids grows too large.  The compaction is done by 
taking centroids in random order and adding them to a new digest.  The slightly 
sophistication is that the centroid being added can be added in pieces to 
several other centroids if they are the closest ones.  This cleverness will 
probably have nearly negligible effect, but it will guarantee the invariant 
more precisely than if you create a new centroid from the centroid if it has 
too much weight to fit on any of the equally near existing centroids.


> Online algorithm for computing accurate Quantiles using 1-D clustering
> ----------------------------------------------------------------------
>
>                 Key: MAHOUT-1361
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1361
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.9
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>             Fix For: 0.9
>
>         Attachments: MAHOUT-1361.patch
>
>
> Implementation of Ted Dunning's paper and initial work on this subject. See 
> https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
>  for the paper.
> An on-line algorithm for computing approximations of rank-based statistics 
> that allows controllable accuracy. This algorithm can also be used to compute 
> hybrid statistics such as trimmed means in addition to computing arbitrary 
> quantiles.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to