Re: Built - In Aggregate Function - Standard Deviation

Amr Awadallah Sun, 31 May 2009 00:04:54 -0700

Zheng,

is SQRT an undocumented builtin UDF? I couldn't see it in the langmanual at:


http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF

hence why I did pow(x,0.5) instead

-- amr

Hive does not support that right now, but Hive supports subquery so wecan do:


SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) )

FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AStotalsqr FROM t) t2;


Zheng

On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <[email protected]<mailto:[email protected]>> wrote:


    Ah - Thanks so much. I didn't know you could reference a column
    that has an aggregate function call again in the same select

statement. This is much cleaner that the approach that I took.I'll give it a shot.


    Thanks again.


    On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[email protected]
    <mailto:[email protected]>> wrote:

        I agree that a builtin for std dev is a good idea.

        that said, you can achieve this easy in one pass, just use:

        select sum( pow(col,2) ) as totsqr, sum( col ) as tot,
        count(1) as n, pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5)
        as stddev
        from ....


        Matt Pestritto wrote:

            Hi.

            Are there plans to write a standard deviation aggregate
            function ?  I had to build my own which translated into
            multiple hive queries.  While it works, a build-in
            function would have been much easier.

            Thanks
            -Matt





--
Yours,
Zheng

Re: Built - In Aggregate Function - Standard Deviation

Reply via email to