UDF" by Ma yankLahiri

Apache Wiki Thu, 19 Aug 2010 12:12:51 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/LanguageManual/UDF" page has been changed by MayankLahiri.
The comment on this change is: added ngrams() and context_ngrams() to UDAF 
list..
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=49&rev2=50

--------------------------------------------------

  ||list ||split(string str, string pat) ||Split str around pat (pat is a 
regular expression) ||
  ||int ||find_in_set(string str, string strList) ||Returns the first occurance 
of str in strList where strList is a comma-delimited string. Returns null if 
either argument is null. Returns 0 if the first argument contains any commas. 
e.g. find_in_set('ab', 'abc,b,ab,c,def') returns 3 ||
  ||array<array<string>> || sentences(string str, string lang, string locale) 
|| Tokenizes a string of natural language text into words and sentences, where 
each sentence is broken at the appropriate sentence boundary and returned as an 
array of words. The 'lang' and 'locale' are optional arguments. e.g. 
sentences('Hello there! How are you?') returns ( ("Hello", "there"), ("How", 
"are", "you") ) ||
+ ||array<struct<string,double>> || ngrams(array<array<string>>, int N, int K, 
int pf) || Returns the top-k N-grams from a set of tokenized sentences, such as 
those returned by the sentences() UDAF. See [[Hive/StatisticsAndDataMining]] 
for more information. ||
+ ||array<struct<string,double>> || context_ngrams(array<array<string>>, 
array<string>, int K, int pf) || Returns the top-k contextual N-grams from a 
set of tokenized sentences, given a string of "context". See 
[[Hive/StatisticsAndDataMining]] for more information.||

[Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by Ma yankLahiri

Reply via email to