Hi Saurabh,

Sorry for the late reply.

You can create a table using this:
https://issues.apache.org/jira/browse/HIVE-637
And then use the newly added UDF: https://issues.apache.org/jira/browse/HIVE-642
to read in the data.

In this way, you won't need to write any Java code. Let us know if you
have any questions.


In the longer term, we want to let our users to write SerDe for that.
The benefit of SerDe is that you will be able to use column names,
instead of
split(blob, "\t")[0], split(blob, "\t")[1], split(blob, "\t")[2], etc.

I didn't get time to write the SerDe how-to last week. Will start to
write it today.
The how-to will go into contrib directory (see
https://issues.apache.org/jira/browse/HIVE-639 ) and with some
examples.

Zheng

On Thu, Jul 16, 2009 at 1:17 AM, Saurabh Nanda<[email protected]> wrote:
>
>
>> So, I'm back to square one. Is there *any* way I can do this using Hive
>> alone? I'm fine with running the data through multiple passes, putting it in
>> temporary tables, if need be. Should I be looking at UDF or SerDe to achieve
>> this?
>
> One way, I'm trying out is to have multiple UDFs, each taking the raw log
> entry as input and returning a specific field. For example,
> extract_ip_address, extract_apache_uid, extract_uri, etc.
>
> Anything simpler?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Reply via email to