I think you can do that with regex replace and GROUP BY. Something like this- select replaced_col, count(1) from (select regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as replaced_col from table)
On 7/28/09, Andraz Tori <[email protected]> wrote: > Hi, > I am a beginner at Hive's SQL so I am sorry if this question is answered > somewhere else. I tried to find the answer in Wiki, but no luck. > > I have a dataset in which one of the columns is text. I need to count > number of records that match certain regex on that column. There are a > number of different regexes that I need to count records for and it > seems that there should be a way to do it in one pass through the data > with hive. > > So how do I do it? > > This doesn't seem to work (on Hive 0.3): > select sum(textcolumn LIKE "%%sometext%%") as CountA, sum(textcolumn > LIKE "%%othertext%%") as CountB from sometable; > > > Thank you for your help in advance! > > > [unrelated: We've written a simple AWS S3 log format deserializer for > Hive. It needs some polishing - if anyone needs it or wants to polish it > up for inclusion, let me know] > > -- > Andraz Tori, CTO > Zemanta Ltd, New York, London, Ljubljana > www.zemanta.com > mail: [email protected] > tel: +386 41 515 767 > twitter: andraz, skype: minmax_test > > > > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
