Hi Zheng,

Thanks for the reply, but I gave up on UDFs & SerDe and resorted to custom
map/reduce scripts instead. In case you're interested, I've written about my
Hive experience at
http://nandz.blogspot.com/2009/07/using-hive-for-weblog-analysis.html

Saurabh.

On Thu, Jul 23, 2009 at 2:15 AM, Zheng Shao <[email protected]> wrote:

> Hi Saurabh,
>
> Sorry for the late reply.
>
> You can create a table using this:
> https://issues.apache.org/jira/browse/HIVE-637
> And then use the newly added UDF:
> https://issues.apache.org/jira/browse/HIVE-642
> to read in the data.
>
> In this way, you won't need to write any Java code. Let us know if you
> have any questions.
>
>
> In the longer term, we want to let our users to write SerDe for that.
> The benefit of SerDe is that you will be able to use column names,
> instead of
> split(blob, "\t")[0], split(blob, "\t")[1], split(blob, "\t")[2], etc.
>
> I didn't get time to write the SerDe how-to last week. Will start to
> write it today.
> The how-to will go into contrib directory (see
> https://issues.apache.org/jira/browse/HIVE-639 ) and with some
> examples.
>
> Zheng
>
> On Thu, Jul 16, 2009 at 1:17 AM, Saurabh Nanda<[email protected]>
> wrote:
> >
> >
> >> So, I'm back to square one. Is there *any* way I can do this using Hive
> >> alone? I'm fine with running the data through multiple passes, putting
> it in
> >> temporary tables, if need be. Should I be looking at UDF or SerDe to
> achieve
> >> this?
> >
> > One way, I'm trying out is to have multiple UDFs, each taking the raw log
> > entry as input and returning a specific field. For example,
> > extract_ip_address, extract_apache_uid, extract_uri, etc.
> >
> > Anything simpler?
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to