Hive does not support that right now, but Hive supports subquery so we can do:
SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) ) FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS totalsqr FROM t) t2; Zheng On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <[email protected]> wrote: > Ah - Thanks so much. I didn't know you could reference a column that has an > aggregate function call again in the same select statement. This is much > cleaner that the approach that I took. I'll give it a shot. > > Thanks again. > > > On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[email protected]> wrote: > >> I agree that a builtin for std dev is a good idea. >> >> that said, you can achieve this easy in one pass, just use: >> >> select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow( >> (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev >> from .... >> >> >> Matt Pestritto wrote: >> >>> Hi. >>> >>> Are there plans to write a standard deviation aggregate function ? I had >>> to build my own which translated into multiple hive queries. While it >>> works, a build-in function would have been much easier. >>> >>> Thanks >>> -Matt >>> >> > -- Yours, Zheng
