Hi, I am a beginner at Hive's SQL so I am sorry if this question is answered somewhere else. I tried to find the answer in Wiki, but no luck.
I have a dataset in which one of the columns is text. I need to count number of records that match certain regex on that column. There are a number of different regexes that I need to count records for and it seems that there should be a way to do it in one pass through the data with hive. So how do I do it? This doesn't seem to work (on Hive 0.3): select sum(textcolumn LIKE "%%sometext%%") as CountA, sum(textcolumn LIKE "%%othertext%%") as CountB from sometable; Thank you for your help in advance! [unrelated: We've written a simple AWS S3 log format deserializer for Hive. It needs some polishing - if anyone needs it or wants to polish it up for inclusion, let me know] -- Andraz Tori, CTO Zemanta Ltd, New York, London, Ljubljana www.zemanta.com mail: [email protected] tel: +386 41 515 767 twitter: andraz, skype: minmax_test
