UDF" by hliu

Apache Wiki Wed, 21 Jan 2009 16:52:12 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by hliu:
http://wiki.apache.org/hadoop/Hive/HiveQL/UDF

------------------------------------------------------------------------------
  [[TableOfContents]]
  
+ == Built in operators ==
+ Relational Operators - The following operators compare the passed operands 
and generate a TRUE or FALSE value depending on whether the comparison between 
the operands holds or not. 
+ ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description'''||
+ || A = B || all primitive types || TRUE if expression A is equal to 
expression B otherwise FALSE ||
+ ||A == B||none!||Fails; SQL uses = and not ==!||
+ ||A <> B||all primitive types||TRUE if expression A is NOT equal to 
expression B otherwise FALSE||
+ ||A < B||all primitive types||TRUE if expression A is less than expression B 
otherwise FALSE||
+ ||A <= B||all primitive types||TRUE if expression A is less than or equal to 
expression B otherwise FALSE||
+ ||A > B||all primitive types||TRUE if expression A is greater than expression 
B otherwise FALSE||
+ ||A >= B||all primitive types||TRUE if expression A is greater than or equal 
to expression B otherwise FALSE||
+ ||A IS NULL||all types||TRUE if expression A evaluates to NULL otherwise 
FALSE||
+ ||A IS NOT NULL||all types||TRUE if expression A evaluates to NULL otherwise 
FALSE||
+ ||A LIKE B||strings||TRUE if string A matches the SQL simple regular 
expression B, otherwise FALSE. The comparison is done character by character. 
The _ character in B matches any character in A(similar to . in posix regular 
expressions) while the % character in B matches an arbitrary number of 
characters in A(similar to .* in posix regular expressions) e.g. 'foobar' like 
'foo' evaluates to FALSE where as 'foobar' like 'foo___' evaluates to TRUE and 
so does 'foobar' like 'foo%'||
+ ||A RLIKE B||strings||TRUE if string A matches the Java regular expression 
B(See Java regular expressions syntax), otherwise FALSE e.g. 'foobar' rlike 
'foo' evaluates to FALSE where as 'foobar' rlike '^f.*r$' evaluates to TRUE||
+ ||A REGEXP B||strings||Same as RLIKE|| 
+ 
+ Arithmetic Operators - The following operators support various common 
arithmetic operations on the operands. All of them return number types. 
+ ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description'''||
+ ||A + B ||all number types ||Gives the result of adding A and B. The type of 
the result is the same as the common parent(in the type hierarchy) of the types 
of the operands. e.g. since every integer is a float, therefore float is a 
containing type of integer so the + operator on a float and an int will result 
in a float.||
+ ||A - B ||all number types ||Gives the result of subtracting B from A. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands.||
+ ||A * B ||all number types ||Gives the result of multiplying A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. Note that if the multiplication causing overflow, 
you will have to cast one of the operators to a type higher in the type 
hierarchy.||
+ ||A / B ||all number types ||Gives the result of dividing B from A. The type 
of the result is the same as the common parent(in the type hierarchy) of the 
types of the operands. If the operands are integer types, then the result is 
the quotient of the division.||
+ ||A % B ||all number types ||Gives the reminder resulting from dividing A by 
B. The type of the result is the same as the common parent(in the type 
hierarchy) of the types of the operands.||
+ ||A & B ||all number types ||Gives the result of bitwise AND of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands.||
+ ||A | B ||all number types ||Gives the result of bitwise OR of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands.||
+ ||A ^ B ||all number types ||Gives the result of bitwise XOR of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands.||
+ ||~A ||all number types ||Gives the result of bitwise NOT of A. The type of 
the result is the same as the type of A. ||
+ 
+ Logical Operators - The following operators provide support for creating 
logical expressions. All of them return boolean TRUE or FALSE depending upon 
the boolean values of the operands. 
+ ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description||
+ ||A AND B ||boolean ||TRUE if both A and B are TRUE, otherwise FALSE||
+ ||A && B ||boolean ||Same as A AND B||
+ ||A OR B ||boolean ||TRUE if either A or B or both are TRUE, otherwise FALSE||
+ ||A | | B ||boolean ||Same as A OR B||
+ ||NOT A ||boolean ||TRUE if A is FALSE, otherwise FALSE||
+ ||! A ||boolean ||Same as NOT A ||
+ 
+ Operators on Complex Types - The following operators provide mechanisms to 
access elements in  Complex Types
+ ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description'''||
+ ||A[n] ||A is an Array and n is an int ||returns the nth element in the array 
A. The first element has index 0 e.g. if A is an array comprising of ['foo', 
'bar'] then A[0] returns 'foo' and A[1] returns 'bar'||
+ ||M[key] ||M is a Map<K, V> and key has type K ||returns the value 
corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 
'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'||
+ ||S.x ||S is a struct ||returns the x field of S e.g for struct foobar {int 
foo, int bar} foobar.foo returns the integer stored in the foo field of the 
struct. ||
+ 
+ The following built in functions are supported in hive: 
+ ||<10%>'''Return Type'''||<10%>'''Name(Signature)'''||'''Description||
+ ||BIGINT ||round(double a) ||returns the rounded BIGINT value of the double||
+ ||BIGINT ||floor(double a) ||returns the maximum BIGINT value that is equal 
or less than the double||
+ ||BIGINT ||ceil(double a) ||returns the minimum BIGINT value that is equal or 
greater than the double||
+ ||double ||rand(), rand(int seed) ||returns a random number (that changes 
from row to row). Specifiying the seed will make sure the generated random 
number sequence is deterministic.||
+ ||string ||concat(string A, string B)||returns the string resulting from 
concatenating B after A e.g. concat('foo', 'bar') results in 'foobar'||
+ ||string ||substr(string A, int start) ||returns the substring of A starting 
from start position till the end of string A e.g. concat('foobar', 3) results 
in 'bar'||
+ ||string ||upper(string A)||returns the string resulting from converting all 
characters of A to upper case e.g. upper('fOoBaR') results in 'FOOBAR'||
+ ||string ||ucase(string A) ||Same as upper||
+ ||string ||lower(string A) ||returns the string resulting from converting all 
characters of B to lower case e.g. lower('fOoBaR') results in 'foobar'||
+ ||string ||lcase(string A) ||Same as lower||
+ ||string ||trim(string A) ||returns the string resulting from trimming spaces 
from both ends of A e.g. trim(' foobar ') results in 'foobar'||
+ ||string ||ltrim(string A) ||returns the string resulting from trimming 
spaces from the beginning(left hand side) of A e.g. ltrim(' foobar ') results 
in 'foobar '||
+ ||string ||rtrim(string A) ||returns the string resulting from trimming 
spaces from the end(right hand side) of A e.g. rtrim(' foobar ') results in ' 
foobar'||
+ ||string ||regexp_replace(string A, string B, string C) ||returns the string 
resulting from replacing all substrings in B that match the Java regular 
expression syntax(See Java regular expressions syntax) with C e.g. 
regexp_replace('foobar', 'oo|ar', '') returns 'fb'||
+ ||int ||size(Map<K.V>) ||returns the number of elements in the map type||
+ ||int ||size(Array<T>) ||returns the number of elements in the array type||
+ ||<type> ||cast(expr as <type>) ||converts the results of the expression expr 
to <type> e.g. cast('1' as BIGINT) will convert the string '1' to it integral 
representation. A null is returned if the conversion does not succeed.||
+ ||string ||from_unixtime(int unixtime) ||convert the number of seconds from 
unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of 
that moment in the current system time zone in the format of "1970-01-01 
00:00:00"||
+ ||string ||to_date(string timestamp) ||Return the date part of a timestamp 
string: to_date("1970-01-01 00:00:00") = "1970-01-01"||
+ ||int ||year(string date) ||Return the year part of a date or a timestamp 
string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970||
+ ||int ||month(string date) ||Return the month part of a date or a timestamp 
string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11||
+ ||int ||day(string date) ||Return the day part of a date or a timestamp 
string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 ||
+ 
+ The following built in aggregate functions are supported in hive: 
+ ||<10%>'''Return Type'''||<10%>'''Name(Signature)'''||'''Description||
+ ||BIGINT ||count(1), count(DISTINCT col [, col]...)||count(1) returns the 
number of members in the group, whereas the count(DISTINCT col) gets the count 
of distinct values of the columns in the group||
+ ||DOUBLE ||sum(col), sum(DISTINCT col) ||returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group||
+ ||DOUBLE ||avg(col), avg(DISTINCT col) ||returns the average of the elements 
in the group or the average of the distinct values of the column in the group||
+ ||DOUBLE ||min(col) ||returns the minimum of the column in the group||
+ ||DOUBLE ||max(col) ||returns the maximum value of the column n the group||
+ 
+ 
+ 
+ 
+ 
+ 
  == UDF supported by Hive ==
- UDF are built in functions in Hive. Hive supports two type of UDFs: generic 
built in functions, and built in aggregate functions.
+ UDF are built in functions in Hive. Hive supports two type of UDFs: generic 
built in functions, and built in aggregate (GROUP BY) functions.
  
- === Built in Aggregate Functions ===
  
+ === Generic built in Functions ===
+ 
+ ==== count ====
+ Returns the number of members in the group, whereas the count(DISTINCT col) 
gets the count of distinct values of the columns in the group. The result is a 
BIGINT value. The DISTINCT option can be used to return the average of the 
distinct values of col.
+ {{{
+ count(1), count(DISTINCT col [, col]...) 
+ }}}
+ 
+ 
+ === Built in Aggregate (GROUP BY) Functions ===
+ 
+ ==== count ====
+ Returns the number of members in the group, whereas the count(DISTINCT col) 
gets the count of distinct values of the columns in the group. The result is a 
BIGINT value. The DISTINCT option can be used to return the average of the 
distinct values of col.
+ {{{
+ count(1), count(DISTINCT col [, col]...) 
+ }}}
+ 
+ ==== sum ====
+ Returns the sum of the elements in the group or the sum of the distinct 
values of the column in the group. The result is a DOUBLE value. 
+ {{{
+ sum(col), sum(DISTINCT col) 
+ }}}
+ 
+ ==== avg ====
+ Returns the average of the elements in the group or the average of the 
distinct values of the column in the group. The result is a DOUBLE value. 
+ {{{
+ count(1), count(DISTINCT col [, col]...) 
+ }}}
+ 
+ ==== min ====
+ Returns the minimum of the column in the group. The result is a DOUBLE value.
+ {{{
+ min(col)
+ }}}
+ 
+ ==== max ====
+ Returns the maximum of the column in the group. The result is a DOUBLE value.
+ {{{
+ max(col)
+ }}}
+

[Hadoop Wiki] Update of "Hive/HiveQL/UDF" by hliu

Reply via email to