Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by hliu: http://wiki.apache.org/hadoop/Hive/HiveQL/UDF ------------------------------------------------------------------------------ [[TableOfContents]] + Built in operators and functions are supported in Hive. + - == Built in operators == + == Built in Operators == === Relational Operators === The following operators compare the passed operands and generate a TRUE or FALSE value depending on whether the comparison between the operands holds or not. ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description'''|| @@ -50, +52 @@ ||M[key] ||M is a Map<K, V> and key has type K ||returns the value corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'|| ||S.x ||S is a struct ||returns the x field of S e.g for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct. || + == Built in Functions == - === Generic built in Functions === + === Generic Built in Functions === The following are built in functions are supported in hive: - ||<10%>'''Return Type'''||<10%>'''Name(Signature)'''||'''Description|| + ||<8%>'''Return Type'''||<15%>'''Name(Signature)'''||'''Description|| ||BIGINT ||round(double a) ||returns the rounded BIGINT value of the double|| ||BIGINT ||floor(double a) ||returns the maximum BIGINT value that is equal or less than the double|| ||BIGINT ||ceil(double a) ||returns the minimum BIGINT value that is equal or greater than the double|| @@ -75, +78 @@ ||int ||year(string date) ||Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970|| ||int ||month(string date) ||Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11|| ||int ||day(string date) ||Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 || + ||string ||get_json_object(string json_string, string path) ||Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid|| + + ==== get_json_object ==== + A limited version of JSONPath supported: + * $ : Root object + * . : Child operator + * [] : Subscript operator for array + * * : Wildcard for [] + Syntax not supported that's worth noticing: + * '' : Zero length string as key + * .. : Recursive descent + * @ : Current object/element + * () : Script expression + * ?() : Filter (script) expression. + * [,] : Union operator + * [start:end:step] : array slice operator + Example + src_json table is a single column (json), single row table: + {{{ + +-----------------------------------------------------------------------------------------------------------------------------------------+ + json + +-----------------------------------------------------------------------------------------------------------------------------------------+ + {"store": + {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], + "book":[{"author":"Nigel Rees","category":"reference","title":"Sayings of the Century","price":8.95}, + {"author":"Herman Melville","category":"fiction","title":"Moby Dick","price":8.99,"isbn":"0-553-21311-3"}, + {"author":"J. R. R. Tolkien","category":"fiction","title":"The Lord of the Rings","price":22.99,"isbn":"0-395-19395-8"} + ], + "bicycle":{"price":19.95,"color":"red"} + }, + "email":"a...@only_for_json_udf_test.net", + "owner":"amy" + + +-----------------------------------------------------------------------------------------------------------------------------------------+ + }}} + {{{ + SELECT get_json_object(src_json.json, '$') FROM src_json; + + SELECT get_json_object(src_json.json, '$.owner') FROM src_json; + + SELECT get_json_object(src_json.json, '$.store.book[0]') FROM src_json; + + SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json; + }}} + {{{ + {"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"book":[{"author":"Nigel Rees","category":"reference","title":"Sayings of the Century","price":8.95},{"author":"Herman Melville","category":"fiction","title":"Moby Dick","price":8.99,"isbn":"0-553-21311-3"},{"author":"J. R. R. Tolkien","category":"fiction","title":"The Lord of the Rings","price":22.99,"reader":[{"name":"bob","age":25},{"name":"jack","age":26}],"isbn":"0-395-19395-8"}],"basket":[[1,2,{"b":"y","a":"x"}],[3,4],[5,6]],"bicycle":{"price":19.95,"color":"red"}},"email":"a...@only_for_json_udf_test.net","owner":"amy"} + + amy + + {"author":"Nigel Rees","category":"reference","title":"Sayings of the Century","price":8.95} + + NULL + }}} === Built in Aggregate (GROUP BY) Functions === @@ -86, +142 @@ ||DOUBLE ||min(col) ||returns the minimum of the column in the group|| ||DOUBLE ||max(col) ||returns the maximum value of the column n the group|| - - - - - - == UDF supported by Hive == - UDF are built in functions in Hive. Hive supports two type of UDFs: generic built in functions, and built in aggregate (GROUP BY) functions. - - - === Generic built in Functions === - - ==== count ==== - Returns the number of members in the group, whereas the count(DISTINCT col) gets the count of distinct values of the columns in the group. The result is a BIGINT value. The DISTINCT option can be used to return the average of the distinct values of col. - {{{ - count(1), count(DISTINCT col [, col]...) - }}} - - - === Built in Aggregate (GROUP BY) Functions === ==== count ==== Returns the number of members in the group, whereas the count(DISTINCT col) gets the count of distinct values of the columns in the group. The result is a BIGINT value. The DISTINCT option can be used to return the average of the distinct values of col.
