UDF" by Pa ulYang

Apache Wiki Mon, 04 Jan 2010 15:03:54 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/LanguageManual/UDF" page has been changed by PaulYang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=29&rev2=30

--------------------------------------------------

  ## page was renamed from Hive/HiveQL/UDF
  <<TableOfContents>>
  
- Built in operators and functions are supported in Hive.
- 
- == Built in Operators ==
+ == Built-in Operators ==
  === Relational Operators ===
- The following operators compare the passed operands and generate a TRUE or 
FALSE value depending on whether the comparison between the operands holds or 
not.
+ The following operators compare the passed operands and generate a TRUE or 
FALSE value depending on whether the comparison between the operands holds.
  ||<10%>'''Operator''' ||<10%>'''Operand types''' ||'''Description''' ||
- ||A = B ||all primitive types ||TRUE if expression A is equal to expression B 
otherwise FALSE ||
+ ||A = B ||All primitive types ||TRUE if expression A is equal to expression B 
otherwise FALSE ||
- ||A == B ||none! ||Fails; SQL uses = and not ==! ||
+ ||A == B ||None! || Fails because of invalid syntax. SQL uses =, not == ||
- ||A <> B ||all primitive types ||TRUE if expression A is NOT equal to 
expression B otherwise FALSE ||
+ ||A <> B ||All primitive types ||TRUE if expression A is NOT equal to 
expression B otherwise FALSE ||
- ||A < B ||all primitive types ||TRUE if expression A is less than expression 
B otherwise FALSE ||
+ ||A < B ||All primitive types ||TRUE if expression A is less than expression 
B otherwise FALSE ||
- ||A <= B ||all primitive types ||TRUE if expression A is less than or equal 
to expression B otherwise FALSE ||
+ ||A <= B ||All primitive types ||TRUE if expression A is less than or equal 
to expression B otherwise FALSE ||
- ||A > B ||all primitive types ||TRUE if expression A is greater than 
expression B otherwise FALSE ||
+ ||A > B ||All primitive types ||TRUE if expression A is greater than 
expression B otherwise FALSE ||
- ||A >= B ||all primitive types ||TRUE if expression A is greater than or 
equal to expression B otherwise FALSE ||
+ ||A >= B ||All primitive types ||TRUE if expression A is greater than or 
equal to expression B otherwise FALSE ||
  ||A IS NULL ||all types ||TRUE if expression A evaluates to NULL otherwise 
FALSE ||
- ||A IS NOT NULL ||all types ||TRUE if expression A evaluates to NULL 
otherwise FALSE ||
+ ||A IS NOT NULL ||All types ||TRUE if expression A evaluates to NULL 
otherwise FALSE ||
  ||A LIKE B ||strings ||TRUE if string A matches the SQL simple regular 
expression B, otherwise FALSE. The comparison is done character by character. 
The _ character in B matches any character in A(similar to . in posix regular 
expressions) while the % character in B matches an arbitrary number of 
characters in A(similar to .* in posix regular expressions) e.g. 'foobar' like 
'foo' evaluates to FALSE where as 'foobar' like 'foo_ _ _' evaluates to TRUE 
and so does 'foobar' like 'foo%' ||
  ||A RLIKE B ||strings ||TRUE if string A matches the Java regular expression 
B(See Java regular expressions syntax), otherwise FALSE e.g. 'foobar' rlike 
'foo' evaluates to FALSE where as 'foobar' rlike '^f.*r$' evaluates to TRUE ||
  ||A REGEXP B ||strings ||Same as RLIKE ||
@@ -26, +24 @@

  
  
  === Arithmetic Operators ===
- The following operators support various common arithmetic operations on the 
operands. All of them return number types.
+ The following operators support various common arithmetic operations on the 
operands. All return number types.
  ||<10%>'''Operator''' ||<10%>'''Operand types''' ||'''Description''' ||
- ||A + B ||all number types ||Gives the result of adding A and B. The type of 
the result is the same as the common parent(in the type hierarchy) of the types 
of the operands. e.g. since every integer is a float, therefore float is a 
containing type of integer so the + operator on a float and an int will result 
in a float. ||
+ ||A + B ||All number types ||Gives the result of adding A and B. The type of 
the result is the same as the common parent(in the type hierarchy) of the types 
of the operands. e.g. since every integer is a float, therefore float is a 
containing type of integer so the + operator on a float and an int will result 
in a float. ||
- ||A - B ||all number types ||Gives the result of subtracting B from A. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
+ ||A - B ||All number types ||Gives the result of subtracting B from A. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
- ||A * B ||all number types ||Gives the result of multiplying A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. Note that if the multiplication causing overflow, 
you will have to cast one of the operators to a type higher in the type 
hierarchy. ||
+ ||A * B ||All number types ||Gives the result of multiplying A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. Note that if the multiplication causing overflow, 
you will have to cast one of the operators to a type higher in the type 
hierarchy. ||
- ||A / B ||all number types ||Gives the result of dividing B from A. The type 
of the result is the same as the common parent(in the type hierarchy) of the 
types of the operands. If the operands are integer types, then the result is 
the quotient of the division. ||
+ ||A / B ||All number types ||Gives the result of dividing B from A. The type 
of the result is the same as the common parent(in the type hierarchy) of the 
types of the operands. If the operands are integer types, then the result is 
the quotient of the division. ||
- ||A % B ||all number types ||Gives the reminder resulting from dividing A by 
B. The type of the result is the same as the common parent(in the type 
hierarchy) of the types of the operands. ||
+ ||A % B ||All number types ||Gives the reminder resulting from dividing A by 
B. The type of the result is the same as the common parent(in the type 
hierarchy) of the types of the operands. ||
- ||A & B ||all number types ||Gives the result of bitwise AND of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
+ ||A & B ||All number types ||Gives the result of bitwise AND of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
- ||A | B ||all number types ||Gives the result of bitwise OR of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
+ ||A | B ||All number types ||Gives the result of bitwise OR of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
- ||A ^ B ||all number types ||Gives the result of bitwise XOR of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
+ ||A ^ B ||All number types ||Gives the result of bitwise XOR of A and B. The 
type of the result is the same as the common parent(in the type hierarchy) of 
the types of the operands. ||
- ||~A ||all number types ||Gives the result of bitwise NOT of A. The type of 
the result is the same as the type of A. ||
+ ||~A ||All number types ||Gives the result of bitwise NOT of A. The type of 
the result is the same as the type of A. ||
  
  
  
@@ -56, +54 @@

  
  === Operators on Complex Types ===
  The following operators provide mechanisms to access elements in  Complex 
Types
- ||<10%>Operator ||<10%>Operand types ||Description ||
+ ||<10%> '''Operator''' ||<10%> '''Operand types''' || '''Description''' ||
  ||A[n] ||A is an Array and n is an int ||Returns the nth element in the array 
A. The first element has index 0 e.g. if A is an array comprising of ['foo', 
'bar'] then A[0] returns 'foo' and A[1] returns 'bar' ||
  ||M[key] ||M is a Map<K, V> and key has type K ||Returns the value 
corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 
'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar' ||
- ||S.x ||S is a struct ||Returns the x field of S e.g for struct foobar {int 
foo, int bar} foobar.foo returns the integer stored in the foo field of the 
struct. ||
+ ||S.x ||S is a struct ||Returns the x field of S. e.g for struct foobar {int 
foo, int bar} foobar.foo returns the integer stored in the foo field of the 
struct. ||
  
  
  
@@ -67, +65 @@

  == Built-in Functions ==
  === Mathematical Functions ===
  The following built-in mathematical functions are supported in hive:
- ||<10%>Return Type ||<15%>Name(Signature) ||Description ||
+ ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
  ||BIGINT ||round(double a) ||Returns the rounded BIGINT value of the double ||
  ||BIGINT ||floor(double a) ||Returns the maximum BIGINT value that is equal 
or less than the double ||
  ||BIGINT ||ceil(double a), ceiling(double a) ||Returns the minimum BIGINT 
value that is equal or greater than the double ||
  ||double ||rand(), rand(int seed) ||Returns a random number (that changes 
from row to row). Specifiying the seed will make sure the generated random 
number sequence is deterministic. ||
- ||double ||exp(double a) ||Return e^a where e is the base of the natural 
logarithm ||
+ ||double ||exp(double a) ||Returns e^a where e is the base of the natural 
logarithm ||
- ||double ||ln(double a) ||Return the natural logarithm of the argument ||
+ ||double ||ln(double a) ||Returns the natural logarithm of the argument ||
- ||double ||log10(double a) ||Return the base-10 logarithm of the argument ||
+ ||double ||log10(double a) ||Returns the base-10 logarithm of the argument ||
- ||double ||log2(double a) ||Return the base-2 logarithm of the argument ||
+ ||double ||log2(double a) ||Returns the base-2 logarithm of the argument ||
  ||double ||log(double base, double a) ||Return the base "base" logarithm of 
the argument ||
  ||double ||pow(double a, double p) power(double a, double p) ||Return a^p ||
- ||double ||sqrt(double a) ||Return the square root of a ||
+ ||double ||sqrt(double a) ||Returns the square root of a ||
- ||string ||bin(BIGINT a) ||Return the number in binary format (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin]]])
 ||
+ ||string ||bin(BIGINT a) ||Returns the number in binary format (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin]]])
 ||
- ||string ||hex(BIGINT a) hex(string a) ||If the argument is a int, hex 
returns the number as a string in hex format. Otherwise if the number is a 
string, it converts each character into its hex representation and returns the 
resulting string.  (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex]]])
 ||
+ ||string ||hex(BIGINT a) hex(string a) ||If the argument is an int, hex 
returns the number as a string in hex format. Otherwise if the number is a 
string, it converts each character into its hex representation and returns the 
resulting string.  (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex]]])
 ||
  ||string ||unhex(string a) ||Inverse of hex. Interprets each pair of 
characters as a hexidecimal number and converts to the character represented by 
the number. ||
- ||string ||conv(BIGINT num, int from_base, int to_base) ||Convert a number 
from a given base to another (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv]]])
 ||
+ ||string ||conv(BIGINT num, int from_base, int to_base) ||Converts a number 
from a given base to another (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv]]])
 ||
- ||double ||abs(double a) ||Return the absolute value ||
+ ||double ||abs(double a) ||Returns the absolute value ||
  ||int double ||pmod(int a, int b) pmod(double a, double b) ||Returns the 
positive value of a mod b ||
  ||double ||sin(double a) ||Returns the sine of a (a is in radians) ||
  ||double ||asin(double a) ||Returns the arc sin of x if -1<=a<=1 or null 
otherwise ||
@@ -97, +95 @@

  
  === Collection Functions ===
  The following built-in collection functions are supported in hive:
+ ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
- ||int ||size(Map<K.V>) ||returns the number of elements in the map type ||
+ ||int ||size(Map<K.V>) ||Returns the number of elements in the map type ||
- ||int ||size(Array<T>) ||returns the number of elements in the array type ||
+ ||int ||size(Array<T>) ||Returns the number of elements in the array type ||
  
  
  
  
  === Type Conversion Functions ===
  The following type conversion functions are supported in hive:
+ ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
- || <type> ||cast(expr as <type>) ||converts the results of the expression 
expr to <type> e.g. cast('1' as BIGINT) will convert the string '1' to it 
integral representation. A null is returned if the conversion does not succeed. 
||
+ || <type> ||cast(expr as <type>) ||Converts the results of the expression 
expr to <type> e.g. cast('1' as BIGINT) will convert the string '1' to it 
integral representation. A null is returned if the conversion does not succeed. 
||
  
  
  
  
  === Date Functions ===
  The following built-in date functions are supported in hive:
- ||string ||from_unixtime(int unixtime) ||convert the number of seconds from 
unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of 
that moment in the current system time zone in the format of "1970-01-01 
00:00:00" ||
+ ||string ||from_unixtime(int unixtime) ||Converts the number of seconds from 
unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of 
that moment in the current system time zone in the format of "1970-01-01 
00:00:00" ||
- ||bigint ||unix_timestamp() ||Get current time stamp. It will use the default 
time zone. ||
+ ||bigint ||unix_timestamp() ||Gets current time stamp using the default time 
zone. ||
- ||bigint ||unix_timestamp(string date) ||Convert time string in format 
`yyyy-MM-dd HH:mm:ss` to Unix time stamp, return 0 if fail: 
unix_timestamp('2009-03-20 11:30:01') = 1237573801 ||
+ ||bigint ||unix_timestamp(string date) ||Converts time string in format 
`yyyy-MM-dd HH:mm:ss` to Unix time stamp, return 0 if fail: 
unix_timestamp('2009-03-20 11:30:01') = 1237573801 ||
  ||bigint ||unix_timestamp(string date, string pattern) ||Convert time string 
with given pattern (see 
[[[http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html]]]) 
to Unix time stamp, return 0 if fail: unix_timestamp('2009-03-20', 
'yyyy-MM-dd') = 1237532400 ||
- ||string ||to_date(string timestamp) ||Return the date part of a timestamp 
string: to_date("1970-01-01 00:00:00") = "1970-01-01" ||
+ ||string ||to_date(string timestamp) ||Returns the date part of a timestamp 
string: to_date("1970-01-01 00:00:00") = "1970-01-01" ||
- ||int ||year(string date) ||Return the year part of a date or a timestamp 
string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 ||
+ ||int ||year(string date) ||Returns the year part of a date or a timestamp 
string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 ||
- ||int ||month(string date) ||Return the month part of a date or a timestamp 
string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11 ||
+ ||int ||month(string date) ||Returns the month part of a date or a timestamp 
string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11 ||
  ||int ||day(string date) dayofmonth(date) ||Return the day part of a date or 
a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 ||
  ||int ||hour(string date) ||Returns the hour of the timestamp: 
hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12 ||
  ||int ||minute(string date) ||Returns the minute of the timestamp ||
@@ -126, +126 @@

  ||int ||weekofyear(string date) ||Return the week number of a timestamp 
string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44 ||
  ||int ||datediff(string enddate, string startdate) ||Return the number of 
days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2 ||
  ||int ||date_add(string startdate, int days) ||Add a number of days to 
startdate: date_add('2008-12-31', 1) = '2009-01-01' ||
- ||int ||date_sub(string startdate, int days) ||Substract a number of days to 
startdate: date_sub('2008-12-31', 1) = '2008-12-30' ||
+ ||int ||date_sub(string startdate, int days) ||Subtract a number of days to 
startdate: date_sub('2008-12-31', 1) = '2008-12-30' ||
  
  
  
@@ -140, +140 @@

  
  === String Functions ===
  The following are built-in String functions are supported in hive:
+ ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
- ||int ||length(string A) ||returns the length of the string ||
+ ||int ||length(string A) ||Returns the length of the string ||
- ||string ||reverse(string A) ||returns the reversed string ||
+ ||string ||reverse(string A) ||Returns the reversed string ||
- ||string ||concat(string A, string B...) ||returns the string resulting from 
concatenating the strings passed in as parameters in order. e.g. concat('foo', 
'bar') results in 'foobar'. Note that this function can take any number of 
input strings. ||
+ ||string ||concat(string A, string B...) ||Returns the string resulting from 
concatenating the strings passed in as parameters in order. e.g. concat('foo', 
'bar') results in 'foobar'. Note that this function can take any number of 
input strings. ||
- ||string ||substr(string A, int start) substring(string A, int start) 
||returns the substring of A starting from start position till the end of 
string A e.g. concat('foobar', 4) results in 'bar' (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]]])
 ||
+ ||string ||substr(string A, int start) substring(string A, int start) 
||Returns the substring of A starting from start position till the end of 
string A e.g. concat('foobar', 4) results in 'bar' (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]]])
 ||
- ||string ||substr(string A, int start, int len) substring(string A, int 
start, int len) ||returns the substring of A starting from start position with 
length len e.g. concat('foobar', 4, 1) results in 'b' (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]]])
 ||
+ ||string ||substr(string A, int start, int len) substring(string A, int 
start, int len) ||Returns the substring of A starting from start position with 
length len e.g. concat('foobar', 4, 1) results in 'b' (see 
[[[http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]]])
 ||
- ||string ||upper(string A) ucase(string A) ||returns the string resulting 
from converting all characters of A to upper case e.g. upper('fOoBaR') results 
in 'FOOBAR' ||
+ ||string ||upper(string A) ucase(string A) ||Returns the string resulting 
from converting all characters of A to upper case e.g. upper('fOoBaR') results 
in 'FOOBAR' ||
- ||string ||lower(string A) lcase(string A) ||returns the string resulting 
from converting all characters of B to lower case e.g. lower('fOoBaR') results 
in 'foobar' ||
+ ||string ||lower(string A) lcase(string A) ||Returns the string resulting 
from converting all characters of B to lower case e.g. lower('fOoBaR') results 
in 'foobar' ||
- ||string ||trim(string A) ||returns the string resulting from trimming spaces 
from both ends of A e.g. trim(' foobar ') results in 'foobar' ||
+ ||string ||trim(string A) ||Returns the string resulting from trimming spaces 
from both ends of A e.g. trim(' foobar ') results in 'foobar' ||
- ||string ||ltrim(string A) ||returns the string resulting from trimming 
spaces from the beginning(left hand side) of A e.g. ltrim(' foobar ') results 
in 'foobar ' ||
+ ||string ||ltrim(string A) ||Returns the string resulting from trimming 
spaces from the beginning(left hand side) of A e.g. ltrim(' foobar ') results 
in 'foobar ' ||
- ||string ||rtrim(string A) ||returns the string resulting from trimming 
spaces from the end(right hand side) of A e.g. rtrim(' foobar ') results in ' 
foobar' ||
+ ||string ||rtrim(string A) ||Returns the string resulting from trimming 
spaces from the end(right hand side) of A e.g. rtrim(' foobar ') results in ' 
foobar' ||
- ||string ||regexp_replace(string A, string B, string C) ||returns the string 
resulting from replacing all substrings in B that match the Java regular 
expression syntax(See Java regular expressions syntax) with C e.g. 
regexp_replace('foobar', 'oo|ar', '') returns 'fb''' ''''' ||
+ ||string ||regexp_replace(string A, string B, string C) ||Returns the string 
resulting from replacing all substrings in B that match the Java regular 
expression syntax(See Java regular expressions syntax) with C e.g. 
regexp_replace('foobar', 'oo|ar', '') returns 'fb' ||
- ||string ||regexp_extract(string subject, string pattern, int intex) 
||returns the string extracted using the pattern. e.g. 
regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar' ||
+ ||string ||regexp_extract(string subject, string pattern, int intex) 
||Returns the string extracted using the pattern. e.g. 
regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar' ||
- ||string ||parse_url(string urlString, string partToExtract) ||Returns the 
specified part from the URL. Valid values for partToExtract include HOST, PATH, 
QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. e.g. 
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST')  returns 
'facebook.com' Also you can get a value of particular key in QUERY, using the 
syntax QUERY:<KEY_NAME> e.g. QUERY:k1. ||
+ ||string ||parse_url(string urlString, string partToExtract) ||Returns the 
specified part from the URL. Valid values for partToExtract include HOST, PATH, 
QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. e.g. 
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST')  returns 
'facebook.com' Also a value of particular key in QUERY can be extracted using 
the syntax QUERY:<KEY_NAME> e.g. QUERY:k1. ||
  ||string ||get_json_object(string json_string, string path) ||Extract json 
object from a json string based on json path specified, and return json string 
of the extracted json object. It will return null if the input json string is 
invalid ||
  ||string ||space(int n) ||Return a string of n spaces ||
  ||string ||repeat(string str, int n) ||Repeat str n times ||
- ||string ||ascii(string str) ||Return the numeric value of the first 
character of str ||
+ ||string ||ascii(string str) ||Returns the numeric value of the first 
character of str ||
  ||string ||lpad(string str1, string str2) ||Pad str1 to the left with str2 ||
  ||string ||rpad(string str1, string str2) ||Pad str1 to the right with str2 ||
  ||list ||split(string str, string pat) ||Split str around pat (pat is a 
regular expression) ||
  ||int ||find_in_set(string str, string strList) ||Returns the first occurance 
of str in strList where strList is a comman-delimited string. Returns null if 
either argument is null. Returns 0 if the first argument contains any commas. 
e.g. find_in_set('ab', 'abc,b,ab,c,def') returns 3 ||
  
  
- 
+ === Misc. Functions ===
  
  ==== get_json_object ====
  A limited version of JSONPath is supported:
@@ -186, +187 @@

  Example: src_json table is a single column (json), single row table:
  
  {{{
- 
+-----------------------------------------------------------------------------------------------------------------------------------------+
+ +-------------------------------------------------------------------+
                                 json
- 
+-----------------------------------------------------------------------------------------------------------------------------------------+
+ +-------------------------------------------------------------------+
  {"store":
    {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
-    "book":[{"author":"Nigel Rees","category":"reference","title":"Sayings of 
the Century","price":8.95},
-            {"author":"Herman Melville","category":"fiction","title":"Moby 
Dick","price":8.99,"isbn":"0-553-21311-3"},
-            {"author":"J. R. R. Tolkien","category":"fiction","title":"The 
Lord of the Rings","price":22.99,"isbn":"0-395-19395-8"}
-           ],
     "bicycle":{"price":19.95,"color":"red"}
    },
   "email":"a...@only_for_json_udf_test.net",
   "owner":"amy"
- 
- 
+-----------------------------------------------------------------------------------------------------------------------------------------+
+ }
+ +-------------------------------------------------------------------+
  }}}
- You can extract json object using these queries:
+ The fields of the json object can be extracted using these queries:
  
  {{{
  hive> SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
  amy
  
- hive> SELECT get_json_object(src_json.json, '$.store.book[0]') FROM src_json;
+ hive> SELECT get_json_object(src_json.json, '$.store.fruit[0]') FROM src_json;
- {"author":"Nigel Rees","category":"reference","title":"Sayings of the 
Century","price":8.95}
+ {"weight":8,"type":"apple"}
  
  hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
  NULL
  }}}
+ 
- == Built in Aggregate (GROUP BY) Functions ==
+ == Built in Aggregate Functions (UDAF) ==
- The following are built in aggregate functions are supported in hive:
+ The following are built-in aggregate functions are supported in Hive:
- ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' ''' 
||
+ ||<10%> '''Return Type''' ||<10%> '''Name(Signature)''' || '''Description''' 
||
  ||BIGINT ||count(1), count(DISTINCT col [, col]...) ||count(1) returns the 
number of members in the group, whereas the count(DISTINCT col) gets the count 
of distinct values of the columns in the group ||
- ||DOUBLE ||sum(col), sum(DISTINCT col) ||returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group ||
+ ||double||sum(col), sum(DISTINCT col) ||Returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group ||
- ||DOUBLE ||avg(col), avg(DISTINCT col) ||returns the average of the elements 
in the group or the average of the distinct values of the column in the group ||
+ ||double ||avg(col), avg(DISTINCT col) ||Returns the average of the elements 
in the group or the average of the distinct values of the column in the group ||
- ||DOUBLE ||min(col) ||returns the minimum of the column in the group ||
+ ||double ||min(col) ||Returns the minimum of the column in the group ||
- ||DOUBLE ||max(col) ||returns the maximum value of the column n the group ||
+ ||double ||max(col) ||Returns the maximum value of the column n the group ||
  
+ == Built in Table Generating Functions (UDTF) ==
+ 
+ Normal user-defined functions, such as concat(), take in a single input row 
and output a single output row. In contrast, table-generating functions 
transform a single input row to multiple output rows. Currently, the only 
table-generating function is explode(), which takes in an array as an input and 
outputs the elements of the array as separate rows. UDTF's can be used in the 
SELECT expression list and as a part of LATERAL VIEW.
+ 
+ An example use of explode() in the SELECT expression list is as follows:
+ 
+ Consider a table named myTable that has a single column (myCol) and two rows:
+ 
+ || Array<int> myCol ||
+ || [1,2,3] ||
+ || [4,5,6] ||
+ 
+ Then running the query:
+ 
+ SELECT explode(myCol) AS myNewCol FROM myTable;
+ 
+ Will produce:
+ 
+ || int myNewCol ||
+ || 1 ||
+ || 2 ||
+ || 3 ||
+ || 4 ||
+ || 5 ||
+ || 6 ||
+ 
+ Using the syntax "SELECT udtf(col) AS colAlias..." has a few limitations:
+ 
+   * No other expressions are allowed in SELECT 
+     * SELECT pageid, explode(adid_list) AS myCol... is not supported
+   * UDTF's can't be nested
+     * SELECT explode(explode(adid_list)) AS myCol... is not supported
+   * GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
+     * SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
+ 
+ Please see lateral view for an alternative syntax that does not have these 
limitations.
+ 
+ The following are built-in table-generating functions are supported in Hive:
+ ||<10%> '''Return Type''' ||<10%> '''Name(Signature)''' || '''Description''' 
||
+ || <type> || explode(array<type> a) || For each element in a, explode() 
generates a row containing that element ||
+

[Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by Pa ulYang

Reply via email to