[jira] Commented: (HIVE-607) Create statistical UDFs.

Emil Ibrishimov (JIRA) Mon, 27 Jul 2009 19:02:49 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735890#action_12735890
 ]


Emil Ibrishimov commented on HIVE-607:
--------------------------------------

Hey Scott. The formula you are using has precision problems when the variance 
is very small relatively to the sum of squares (devavg and avg*avg can get 
really big while at the same time the variance can still be really small and 
this way a lot of information can be lost - sometimes the result can be even 
negative).
I am using a modification of this formula: 
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm
 which fixes this problem.
I will attach a patch tomorrow when I'm done testing it.

> Create statistical UDFs.
> ------------------------
>
>                 Key: HIVE-607
>                 URL: https://issues.apache.org/jira/browse/HIVE-607
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: S. Alex Smith
>            Assignee: Emil Ibrishimov
>            Priority: Minor
>         Attachments: UDAFStddev.java
>
>
> Create UDFs replicating:
> STD()         Return the population standard deviation
> STDDEV_POP()(v5.0.3)  Return the population standard deviation
> STDDEV_SAMP()(v5.0.3)         Return the sample standard deviation
> STDDEV()      Return the population standard deviation
> SUM()         Return the sum
> VAR_POP()(v5.0.3)     Return the population standard variance
> VAR_SAMP()(v5.0.3)    Return the sample variance
> VARIANCE()(v4.1)      Return the population standard variance
> as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-607) Create statistical UDFs.

Reply via email to