Earl Cahill
Thu, 25 Sep 2008 11:01:43 -0700
For now, for the examples I will put up an examples page on just a site of
mine. So far as the code, I could maybe put up a google code pig examples
project or something?
I am mostly interested in parsing apache logs, and I understand there are
likely other pig uses, but here is some code I have either written or would
like to write
CommonLogParser - parses the standard apache access_log
CombinedLogParser - parses a log based on the combined LogFormat
DayExtractor - given the standard apache time format (%t), extracts the day
(MM-dd-yyyy)
HostExtactor - given a url, extract the host
IsSearchBotHit - given a user agent, determine if the hit came from a search bot
IsPageView - given a userAgent and a uri, determine if the hit is a page view
(ie, an html hit, rather than a js, image or whatever hit)
MyLength - return the length of the given field
SearchEngineExtractor - given the userAgent, when appropriate, return a name
for the search engine like "Google", "Google Uzbekistan" or "Godado"
SearchTermsExtractor - given the userAgent, when appropriate, return the search
terms
most of the classes are rather short, but I think folks would rather not have
to rewrite them. Except for the SearchTermsExtractor, I am either done or
pretty close on all of these. A couple of them may require some maintainence
like the search engine classes.
With the classes, I think I could do pretty well everything on my list.
I would like the classes to be production worthy and would be happy to
contribute them. Thoughts?
Thanks,
Earl
----- Original Message ----
From: Alan Gates <[EMAIL PROTECTED]>
To: pig-user@incubator.apache.org; Earl Cahill <[EMAIL PROTECTED]>
Sent: Thursday, September 25, 2008 9:29:14 AM
Subject: Re: some few examples
I don't think we have anything like this yet, but I think having a
PigUserExamples page, with links to pages of specific examples, like
yours, would be great. The PigUserExamples page could be linked off the
main page.
As far as where to put your code, if it's something that could actually
be used for pig scripts, it can go in contrib under piggybank (the user
contributed UDFs). If it's really for tutorial purposes and not
production worthy I'm not sure. We could add a tutorial section to
contrib. I think the existing tutorial is a unit aimed at helping
people get started, so we don't want to add to it.
Alan.
Earl Cahill wrote:
> howdy,
>
> Just starting to dive into pig, and have had a hard time finding examples. I
> would like to put up some examples (on the wiki?) of what I hope to be simple
> scripts that could help find the following on a per host / per day basis
>
> hits
> hits per canonized userAgent
> average microseconds to serve per uri
> hits per canonized search engine
> hits per canonized search engine terms
> bytes
> hits per referer
> hits per canonized referer host
> etc
>
> Has such a library already been started?
>
> Some of the scripts will have to rely on some java helper code, which I would
> be happy to contribute, but where can I put it? Perhaps in tutorial.jar?
> helper.jar? Anyone have thoughts about such things on the wiki?
>
> Thanks,
> Earl
>
>
>
>
>