Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/LanguageManual/UDF" page has been changed by AdamKramer.
The comment on this change is: Adds a caveat that has been messing me up; also 
corrects two typoes..
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=34&rev2=35

--------------------------------------------------

  <<TableOfContents>>
  
  NOTE: These commands will be able to show you the latest function 
documentations.
+ 
  {{{
  SHOW FUNCTIONS;
  DESCRIBE FUNCTION <function_name>;
  }}}
- 
  == Built-in Operators ==
  === Relational Operators ===
  The following operators compare the passed operands and generate a TRUE or 
FALSE value depending on whether the comparison between the operands holds.
  ||<10%>'''Operator''' ||<10%>'''Operand types''' ||'''Description''' ||
  ||A = B ||All primitive types ||TRUE if expression A is equal to expression B 
otherwise FALSE ||
- ||A == B ||None! || Fails because of invalid syntax. SQL uses =, not == ||
+ ||A == B ||None! ||Fails because of invalid syntax. SQL uses =, not == ||
  ||A <> B ||All primitive types ||TRUE if expression A is NOT equal to 
expression B otherwise FALSE ||
  ||A < B ||All primitive types ||TRUE if expression A is less than expression 
B otherwise FALSE ||
  ||A <= B ||All primitive types ||TRUE if expression A is less than or equal 
to expression B otherwise FALSE ||
@@ -60, +60 @@

  
  === Operators on Complex Types ===
  The following operators provide mechanisms to access elements in  Complex 
Types
- ||<10%> '''Operator''' ||<10%> '''Operand types''' || '''Description''' ||
+ ||<10%>'''Operator''' ||<10%>'''Operand types''' ||'''Description''' ||
  ||A[n] ||A is an Array and n is an int ||Returns the nth element in the array 
A. The first element has index 0 e.g. if A is an array comprising of ['foo', 
'bar'] then A[0] returns 'foo' and A[1] returns 'bar' ||
  ||M[key] ||M is a Map<K, V> and key has type K ||Returns the value 
corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 
'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar' ||
  ||S.x ||S is a struct ||Returns the x field of S. e.g for struct foobar {int 
foo, int bar} foobar.foo returns the integer stored in the foo field of the 
struct. ||
@@ -71, +71 @@

  == Built-in Functions ==
  === Mathematical Functions ===
  The following built-in mathematical functions are supported in hive:
- ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
+ ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description''' ||
  ||BIGINT ||round(double a) ||Returns the rounded BIGINT value of the double ||
  ||BIGINT ||floor(double a) ||Returns the maximum BIGINT value that is equal 
or less than the double ||
  ||BIGINT ||ceil(double a), ceiling(double a) ||Returns the minimum BIGINT 
value that is equal or greater than the double ||
@@ -101, +101 @@

  
  === Collection Functions ===
  The following built-in collection functions are supported in hive:
- ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
+ ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description''' ||
  ||int ||size(Map<K.V>) ||Returns the number of elements in the map type ||
  ||int ||size(Array<T>) ||Returns the number of elements in the array type ||
  
@@ -110, +110 @@

  
  === Type Conversion Functions ===
  The following type conversion functions are supported in hive:
- ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
+ ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description''' ||
- || <type> ||cast(expr as <type>) ||Converts the results of the expression 
expr to <type> e.g. cast('1' as BIGINT) will convert the string '1' to it 
integral representation. A null is returned if the conversion does not succeed. 
||
+ ||'''Expected "=" to follow "type"''' ||cast(expr as <type>) ||Converts the 
results of the expression expr to <type> e.g. cast('1' as BIGINT) will convert 
the string '1' to it integral representation. A null is returned if the 
conversion does not succeed. ||
  
  
  
@@ -146, +146 @@

  
  === String Functions ===
  The following are built-in String functions are supported in hive:
- ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description''' 
||
+ ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description''' ||
  ||int ||length(string A) ||Returns the length of the string ||
  ||string ||reverse(string A) ||Returns the reversed string ||
  ||string ||concat(string A, string B...) ||Returns the string resulting from 
concatenating the strings passed in as parameters in order. e.g. concat('foo', 
'bar') results in 'foobar'. Note that this function can take any number of 
input strings. ||
@@ -157, +157 @@

  ||string ||trim(string A) ||Returns the string resulting from trimming spaces 
from both ends of A e.g. trim(' foobar ') results in 'foobar' ||
  ||string ||ltrim(string A) ||Returns the string resulting from trimming 
spaces from the beginning(left hand side) of A e.g. ltrim(' foobar ') results 
in 'foobar ' ||
  ||string ||rtrim(string A) ||Returns the string resulting from trimming 
spaces from the end(right hand side) of A e.g. rtrim(' foobar ') results in ' 
foobar' ||
- ||string ||regexp_replace(string A, string B, string C) ||Returns the string 
resulting from replacing all substrings in B that match the Java regular 
expression syntax(See Java regular expressions syntax) with C e.g. 
regexp_replace('foobar', 'oo|ar', '') returns 'fb' ||
- ||string ||regexp_extract(string subject, string pattern, int intex) 
||Returns the string extracted using the pattern. e.g. 
regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar' ||
+ ||string ||regexp_replace(string A, string B, string C) ||Returns the string 
resulting from replacing all substrings in B that match the Java regular 
expression syntax(See Java regular expressions syntax) with C e.g. 
regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is 
necessary in using predefined character classes: using '\s' as the second 
argument will match the letter s; '\\s' is necessary to match whitespace, etc. 
||
+ ||string ||regexp_extract(string subject, string pattern, int intex) 
||Returns the string extracted using the pattern. e.g. 
regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some 
care is necessary in using predefined character classes: using '\s' as the 
second argument will match the letter s; '\\s' is necessary to match 
whitespace, etc. ||
  ||string ||parse_url(string urlString, string partToExtract) ||Returns the 
specified part from the URL. Valid values for partToExtract include HOST, PATH, 
QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. e.g. 
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST')  returns 
'facebook.com' Also a value of particular key in QUERY can be extracted using 
the syntax QUERY:<KEY_NAME> e.g. QUERY:k1. ||
  ||string ||get_json_object(string json_string, string path) ||Extract json 
object from a json string based on json path specified, and return json string 
of the extracted json object. It will return null if the input json string is 
invalid ||
  ||string ||space(int n) ||Return a string of n spaces ||
@@ -167, +167 @@

  ||string ||lpad(string str, int len, string pad) ||Returns str, left-padded 
with pad to a length of len ||
  ||string ||rpad(string str, int len, string pad) ||Returns str, right-padded 
with pad to a length of len ||
  ||list ||split(string str, string pat) ||Split str around pat (pat is a 
regular expression) ||
- ||int ||find_in_set(string str, string strList) ||Returns the first occurance 
of str in strList where strList is a comman-delimited string. Returns null if 
either argument is null. Returns 0 if the first argument contains any commas. 
e.g. find_in_set('ab', 'abc,b,ab,c,def') returns 3 ||
+ ||int ||find_in_set(string str, string strList) ||Returns the first occurance 
of str in strList where strList is a comma-delimited string. Returns null if 
either argument is null. Returns 0 if the first argument contains any commas. 
e.g. find_in_set('ab', 'abc,b,ab,c,def') returns 3 ||
+ 
+ 
  
  
  === Misc. Functions ===
- 
  ==== get_json_object ====
  A limited version of JSONPath is supported:
  
@@ -217, +218 @@

  hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
  NULL
  }}}
- 
  == Built-in Aggregate Functions (UDAF) ==
  The following are built-in aggregate functions are supported in Hive:
- ||<10%> '''Return Type''' ||<10%> '''Name(Signature)''' || '''Description''' 
||
+ ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' ''' 
||
  ||BIGINT ||count(1), count(DISTINCT col [, col]...) ||count(1) returns the 
number of members in the group, whereas the count(DISTINCT col) gets the count 
of distinct values of the columns in the group ||
- ||double||sum(col), sum(DISTINCT col) ||Returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group ||
+ ||double ||sum(col), sum(DISTINCT col) ||Returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group ||
  ||double ||avg(col), avg(DISTINCT col) ||Returns the average of the elements 
in the group or the average of the distinct values of the column in the group ||
  ||double ||min(col) ||Returns the minimum of the column in the group ||
  ||double ||max(col) ||Returns the maximum value of the column n the group ||
  
+ 
+ 
+ 
  == Built-in Table-Generating Functions (UDTF) ==
- <<Anchor(UDTF)>>
- Normal user-defined functions, such as concat(), take in a single input row 
and output a single output row. In contrast, table-generating functions 
transform a single input row to multiple output rows. Currently, the only 
table-generating function is explode(), which takes in an array as an input and 
outputs the elements of the array as separate rows. UDTF's can be used in the 
SELECT expression list and as a part of LATERAL VIEW.
+ <<Anchor(UDTF)>> Normal user-defined functions, such as concat(), take in a 
single input row and output a single output row. In contrast, table-generating 
functions transform a single input row to multiple output rows. Currently, the 
only table-generating function is explode(), which takes in an array as an 
input and outputs the elements of the array as separate rows. UDTF's can be 
used in the SELECT expression list and as a part of LATERAL VIEW.
  
  An example use of explode() in the SELECT expression list is as follows:
  
  Consider a table named myTable that has a single column (myCol) and two rows:
- 
- || Array<int> myCol ||
+ ||Array<int> myCol ||
- || [1,2,3] ||
+ ||[1,2,3] ||
- || [4,5,6] ||
+ ||[4,5,6] ||
+ 
+ 
+ 
  
  Then running the query:
  
  SELECT explode(myCol) AS myNewCol FROM myTable;
  
  Will produce:
- 
- || int myNewCol ||
+ ||int myNewCol ||
- || 1 ||
+ ||1 ||
- || 2 ||
+ ||2 ||
- || 3 ||
+ ||3 ||
- || 4 ||
+ ||4 ||
- || 5 ||
+ ||5 ||
- || 6 ||
+ ||6 ||
+ 
+ 
+ 
  
  Using the syntax "SELECT udtf(col) AS colAlias..." has a few limitations:
  
-   * No other expressions are allowed in SELECT 
+  * No other expressions are allowed in SELECT
-     * SELECT pageid, explode(adid_list) AS myCol... is not supported
+   * SELECT pageid, explode(adid_list) AS myCol... is not supported
-   * UDTF's can't be nested
+  * UDTF's can't be nested
-     * SELECT explode(explode(adid_list)) AS myCol... is not supported
+   * SELECT explode(explode(adid_list)) AS myCol... is not supported
-   * GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
+  * GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
-     * SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
+   * SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
  
  Please see [wiki:Self:Hive/LanguageManual/LateralView] for an alternative 
syntax that does not have these limitations.
-  
+ 
  The following are built-in table-generating functions are supported in Hive:
- ||<10%> '''Return Type''' ||<10%> '''Name(Signature)''' || '''Description''' 
||
+ ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' ''' 
||
- || <type> || explode(array<type> a) <<Anchor(explode)>> || For each element 
in a, explode() generates a row containing that element ||
+ ||'''Expected "=" to follow "type"''' ||explode(array<type> a) 
<<Anchor(explode)>> ||For each element in a, explode() generates a row 
containing that element ||
  
+ 
+ 

Reply via email to