[
https://issues.apache.org/jira/browse/MAHOUT-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830483#comment-13830483
]
Ted Dunning commented on MAHOUT-1361:
-------------------------------------
{quote}
it's my understanding current code works on double values (integers).
{quote}
It works on doubles, not integers (casting works, of course.
{quote}
Do you think it is possible to adapt it to a lexicographical order of unlimited
values?
{quote}
Yes.
Except that you need to define what a mean of a bunch of lexicographical values
might be. I need an online updatable quantity that works for that. It might
be as simple as 11 random values and reservoir sampling, but I don't see a
compelling use case for that yet.
> Online algorithm for computing accurate Quantiles using 1-D clustering
> ----------------------------------------------------------------------
>
> Key: MAHOUT-1361
> URL: https://issues.apache.org/jira/browse/MAHOUT-1361
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.9
> Reporter: Suneel Marthi
> Assignee: Suneel Marthi
> Fix For: 0.9
>
> Attachments: MAHOUT-1361.patch
>
>
> Implementation of Ted Dunning's paper and initial work on this subject. See
> https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
> for the paper.
> An on-line algorithm for computing approximations of rank-based statistics
> that allows controllable accuracy. This algorithm can also be used to compute
> hybrid statistics such as trimmed means in addition to computing arbitrary
> quantiles.
--
This message was sent by Atlassian JIRA
(v6.1#6144)