UDF" by Ma yankLahiri

Apache Wiki Mon, 28 Jun 2010 13:36:39 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/LanguageManual/UDF" page has been changed by MayankLahiri.
The comment on this change is: added entries for UDAFs: variance, stddev, 
percentile, percentile_approx, histogram_numeric.
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=43&rev2=44

--------------------------------------------------

  == Built-in Aggregate Functions (UDAF) ==
  The following are built-in aggregate functions are supported in Hive:
  ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' ''' 
||
- ||BIGINT ||count(1), count(DISTINCT col [, col]...) ||count(1) returns the 
number of members in the group, whereas the count(DISTINCT col) gets the count 
of distinct values of the columns in the group ||
+ ||bigint ||count(1), count(DISTINCT col [, col]...) ||count(1) returns the 
number of members in the group, whereas the count(DISTINCT col) gets the count 
of distinct values of the columns in the group ||
  ||double ||sum(col), sum(DISTINCT col) ||Returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group ||
  ||double ||avg(col), avg(DISTINCT col) ||Returns the average of the elements 
in the group or the average of the distinct values of the column in the group ||
  ||double ||min(col) ||Returns the minimum of the column in the group ||
- ||double ||max(col) ||Returns the maximum value of the column n the group ||
+ ||double ||max(col) ||Returns the maximum value of the column in the group ||
- 
- 
- 
+ ||double ||var_pop(col) ||Returns the variance of a numeric column in the 
group ||
+ ||double ||var_samp(col) || Returns the unbiased sample variance of a numeric 
column in the group ||
+ ||double ||stddev_pop(col) || Returns the standard deviation of a numeric 
column in the group ||
+ ||double ||stddev_samp(col) || Returns the unbiased sample standard deviation 
of a numeric column in the group ||
+ ||double ||percentile(col, p) || Returns the exact p^th^ percentile of an 
integer column in the group (does not work with floating point types). p must 
be between 0 and 1. ||
+ ||array<double> || percentile(col, array(p,,1,, [, p,,2,,]...)) || Returns 
the exact percentiles p,,1,,, p,,2,,, ... of an integer column in the group 
(does not work with floating point types). p,,i,, must be between 0 and 1.  ||
+ ||double ||percentile_approx(col, p [, B]) || Returns an approximate p^th^ 
percentile of a numeric column (including floating point types) in the group. 
The B parameter controls approximation accuracy at the cost of memory. Higher 
values yield better approximations, and the default is 10,000. When the number 
of distinct values in col is smaller than B, this gives an exact percentile 
value. ||
+ ||array<double> || percentile_approx(col, array(p,,1,, [, p,,2,,]...) [, B]) 
|| Same as above, but accepts and returns an array of percentile values instead 
of a single one. ||
+ ||array<struct `{'x','y'}`>|| histogram_numeric(col, b) || Computes a 
histogram of a numeric column in the group using b non-uniformly spaced bins. 
The output is an array of size b of double-valued (x,y) coordinates that 
represent the bin centers and heights ||
  
  == Built-in Table-Generating Functions (UDTF) ==
  <<Anchor(UDTF)>> Normal user-defined functions, such as concat(), take in a 
single input row and output a single output row. In contrast, table-generating 
functions transform a single input row to multiple output rows. Currently, the 
only table-generating function is explode(), which takes in an array as an 
input and outputs the elements of the array as separate rows. UDTF's can be 
used in the SELECT expression list and as a part of LATERAL VIEW.

[Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by Ma yankLahiri

Reply via email to