Saraubh, seems like this approach would fail if a particular line might match more than one regex, for example if you're looking for lines that match "dog" or "good" and one line contains "dogood".
Andraz, have you tried something like: select sum(if(textcolumn like "...", 1, 0)) as CountA, sum(if(textcolumn like "...", 1, 0)) as CountB from sometable; On 7/27/09 3:00 PM, "Saurabh Nanda" <[email protected]> wrote: > I think you can do that with regex replace and GROUP BY. Something > like this- select replaced_col, count(1) from (select > regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as > replaced_col from table) a group by a.replaced_col > > On 7/28/09, Andraz Tori <[email protected]> wrote: >> Hi, >> I am a beginner at Hive's SQL so I am sorry if this question is answered >> somewhere else. I tried to find the answer in Wiki, but no luck. >> >> I have a dataset in which one of the columns is text. I need to count >> number of records that match certain regex on that column. There are a >> number of different regexes that I need to count records for and it >> seems that there should be a way to do it in one pass through the data >> with hive. >> >> So how do I do it? >> >> This doesn't seem to work (on Hive 0.3): >> select sum(textcolumn LIKE "%%sometext%%") as CountA, sum(textcolumn >> LIKE "%%othertext%%") as CountB from sometable; >> >> >> Thank you for your help in advance! >> >> >> [unrelated: We've written a simple AWS S3 log format deserializer for >> Hive. It needs some polishing - if anyone needs it or wants to polish it >> up for inclusion, let me know] >> >> -- >> Andraz Tori, CTO >> Zemanta Ltd, New York, London, Ljubljana >> www.zemanta.com >> mail: [email protected] >> tel: +386 41 515 767 >> twitter: andraz, skype: minmax_test >> >> >> >> > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com
