Zheng,
is SQRT an undocumented builtin UDF? I couldn't see it in the lang
manual at:
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF
hence why I did pow(x,0.5) instead
-- amr
Hive does not support that right now, but Hive supports subquery so we
can do:
SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) )
FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS
totalsqr FROM t) t2;
Zheng
On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <[email protected]
<mailto:[email protected]>> wrote:
Ah - Thanks so much. I didn't know you could reference a column
that has an aggregate function call again in the same select
statement. This is much cleaner that the approach that I took.
I'll give it a shot.
Thanks again.
On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <[email protected]
<mailto:[email protected]>> wrote:
I agree that a builtin for std dev is a good idea.
that said, you can achieve this easy in one pass, just use:
select sum( pow(col,2) ) as totsqr, sum( col ) as tot,
count(1) as n, pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5)
as stddev
from ....
Matt Pestritto wrote:
Hi.
Are there plans to write a standard deviation aggregate
function ? I had to build my own which translated into
multiple hive queries. While it works, a build-in
function would have been much easier.
Thanks
-Matt
--
Yours,
Zheng