Hi Zheng, Thanks for the reply, but I gave up on UDFs & SerDe and resorted to custom map/reduce scripts instead. In case you're interested, I've written about my Hive experience at http://nandz.blogspot.com/2009/07/using-hive-for-weblog-analysis.html
Saurabh. On Thu, Jul 23, 2009 at 2:15 AM, Zheng Shao <[email protected]> wrote: > Hi Saurabh, > > Sorry for the late reply. > > You can create a table using this: > https://issues.apache.org/jira/browse/HIVE-637 > And then use the newly added UDF: > https://issues.apache.org/jira/browse/HIVE-642 > to read in the data. > > In this way, you won't need to write any Java code. Let us know if you > have any questions. > > > In the longer term, we want to let our users to write SerDe for that. > The benefit of SerDe is that you will be able to use column names, > instead of > split(blob, "\t")[0], split(blob, "\t")[1], split(blob, "\t")[2], etc. > > I didn't get time to write the SerDe how-to last week. Will start to > write it today. > The how-to will go into contrib directory (see > https://issues.apache.org/jira/browse/HIVE-639 ) and with some > examples. > > Zheng > > On Thu, Jul 16, 2009 at 1:17 AM, Saurabh Nanda<[email protected]> > wrote: > > > > > >> So, I'm back to square one. Is there *any* way I can do this using Hive > >> alone? I'm fine with running the data through multiple passes, putting > it in > >> temporary tables, if need be. Should I be looking at UDF or SerDe to > achieve > >> this? > > > > One way, I'm trying out is to have multiple UDFs, each taking the raw log > > entry as input and returning a specific field. For example, > > extract_ip_address, extract_apache_uid, extract_uri, etc. > > > > Anything simpler? > > > > Saurabh. > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
