UDF" by hliu

Apache Wiki Wed, 21 Jan 2009 17:38:36 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by hliu:
http://wiki.apache.org/hadoop/Hive/HiveQL/UDF

------------------------------------------------------------------------------
  [[TableOfContents]]
  
+ Built in operators and functions are supported in Hive.
+ 
- == Built in operators ==
+ == Built in Operators ==
  === Relational Operators ===
  The following operators compare the passed operands and generate a TRUE or 
FALSE value depending on whether the comparison between the operands holds or 
not. 
  ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description'''||
@@ -50, +52 @@

  ||M[key] ||M is a Map<K, V> and key has type K ||returns the value 
corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 
'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'||
  ||S.x ||S is a struct ||returns the x field of S e.g for struct foobar {int 
foo, int bar} foobar.foo returns the integer stored in the foo field of the 
struct. ||
  
+ == Built in Functions ==
- === Generic built in Functions ===
+ === Generic Built in Functions ===
  The following are built in functions are supported in hive: 
- ||<10%>'''Return Type'''||<10%>'''Name(Signature)'''||'''Description||
+ ||<8%>'''Return Type'''||<15%>'''Name(Signature)'''||'''Description||
  ||BIGINT ||round(double a) ||returns the rounded BIGINT value of the double||
  ||BIGINT ||floor(double a) ||returns the maximum BIGINT value that is equal 
or less than the double||
  ||BIGINT ||ceil(double a) ||returns the minimum BIGINT value that is equal or 
greater than the double||
@@ -75, +78 @@

  ||int ||year(string date) ||Return the year part of a date or a timestamp 
string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970||
  ||int ||month(string date) ||Return the month part of a date or a timestamp 
string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11||
  ||int ||day(string date) ||Return the day part of a date or a timestamp 
string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 ||
+ ||string ||get_json_object(string json_string, string path) ||Extract json 
object from a json string based on json path specified, and return json string 
of the extracted json object. It will return null if the input json string is 
invalid||
+ 
+ ==== get_json_object ====
+ A limited version of JSONPath supported:
+    *    $   : Root object
+    *    .   : Child operator
+    *    []  : Subscript operator for array
+    *    *   : Wildcard for []
+ Syntax not supported that's worth noticing:
+    *    ''  : Zero length string as key
+    *    ..  : Recursive descent
+    *    @   : Current object/element
+    *    ()  : Script expression
+    *    ?() : Filter (script) expression.
+    *    [,] : Union operator
+    *    [start:end:step] : array slice operator
+ Example
+ src_json table is a single column (json), single row table:
+ {{{
+ 
+-----------------------------------------------------------------------------------------------------------------------------------------+
+                                json
+ 
+-----------------------------------------------------------------------------------------------------------------------------------------+
+ {"store":                                                                     
                                                           
+   {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],          
                                                            
+    "book":[{"author":"Nigel Rees","category":"reference","title":"Sayings of 
the Century","price":8.95},
+            {"author":"Herman Melville","category":"fiction","title":"Moby 
Dick","price":8.99,"isbn":"0-553-21311-3"},
+            {"author":"J. R. R. Tolkien","category":"fiction","title":"The 
Lord of the Rings","price":22.99,"isbn":"0-395-19395-8"}
+           ],
+    "bicycle":{"price":19.95,"color":"red"}
+   },
+  "email":"a...@only_for_json_udf_test.net",
+  "owner":"amy"
+ 
+ 
+-----------------------------------------------------------------------------------------------------------------------------------------+
+ }}}
+ {{{
+ SELECT get_json_object(src_json.json, '$') FROM src_json;
+ 
+ SELECT get_json_object(src_json.json, '$.owner') FROM src_json; 
+ 
+ SELECT get_json_object(src_json.json, '$.store.book[0]') FROM src_json;
+ 
+ SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
+ }}}
+ {{{
+ 
{"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"book":[{"author":"Nigel
 Rees","category":"reference","title":"Sayings of the 
Century","price":8.95},{"author":"Herman 
Melville","category":"fiction","title":"Moby 
Dick","price":8.99,"isbn":"0-553-21311-3"},{"author":"J. R. R. 
Tolkien","category":"fiction","title":"The Lord of the 
Rings","price":22.99,"reader":[{"name":"bob","age":25},{"name":"jack","age":26}],"isbn":"0-395-19395-8"}],"basket":[[1,2,{"b":"y","a":"x"}],[3,4],[5,6]],"bicycle":{"price":19.95,"color":"red"}},"email":"a...@only_for_json_udf_test.net","owner":"amy"}
+ 
+ amy
+ 
+ {"author":"Nigel Rees","category":"reference","title":"Sayings of the 
Century","price":8.95}    
+ 
+ NULL
+ }}}
  
  
  === Built in Aggregate (GROUP BY) Functions ===
@@ -86, +142 @@

  ||DOUBLE ||min(col) ||returns the minimum of the column in the group||
  ||DOUBLE ||max(col) ||returns the maximum value of the column n the group||
  
- 
- 
- 
- 
- 
- == UDF supported by Hive ==
- UDF are built in functions in Hive. Hive supports two type of UDFs: generic 
built in functions, and built in aggregate (GROUP BY) functions.
- 
- 
- === Generic built in Functions ===
- 
- ==== count ====
- Returns the number of members in the group, whereas the count(DISTINCT col) 
gets the count of distinct values of the columns in the group. The result is a 
BIGINT value. The DISTINCT option can be used to return the average of the 
distinct values of col.
- {{{
- count(1), count(DISTINCT col [, col]...) 
- }}}
- 
- 
- === Built in Aggregate (GROUP BY) Functions ===
  
  ==== count ====
  Returns the number of members in the group, whereas the count(DISTINCT col) 
gets the count of distinct values of the columns in the group. The result is a 
BIGINT value. The DISTINCT option can be used to return the average of the 
distinct values of col.

[Hadoop Wiki] Update of "Hive/HiveQL/UDF" by hliu

Reply via email to