pig-user  

Re: RegExLoader, MyRegExLoader, CommonLogLoader, formatting file

Alan Gates
Mon, 06 Oct 2008 14:26:34 -0700

Earl,

Thanks for the contributions. The way to do this is open a JIRA at http://issues.apache.org/jira/browse/PIG and attach your patch to that. We can then download the patch, review it, test it, etc.

Alan.

Earl Cahill wrote:
Howdy,

I was well on my way to writing a class that would load logs built from apache's common log format, as has been discussed on this list. Then it hit me that really, I was just loading based on a regex, so I rearchitected to the following classes

RegExLoader - implements ReversibleLoadStoreFunc and does the heavy lifting of loading. Is an abstract class, with just one abstract method getPattern() which returns the pattern you would like to load with. On a successful match, it walks through the find groups and adds each group to a tuple list.

CommonLogLoader - implements RegExLoader with a regex for apache's common log format

MyRegExLoader - implements RegExLoader and allows for arbitrary regexes from pig latin like

A = LOAD 'file:test.txt' USING org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)');

I added unit tests and some simple java docs, but didn't want to go too far till I got a little feedback. I attached a patch file, and hope I did it right.

I also attached a eclipse formatter file. A former co-worker went through sun's standards and several others where sun's standard was silent and made an eclipse formatter file. My team has been using it and I like the idea. It makes our code look rather similar and there are many style decisions to make besides four space indent. Anyway, even if all you pig folks don't like this particular file, I definitely suggest having one for the project.

In case the files don't make it, I also put them here

http://pig.holaservers.com/patch.txt
http://pig.holaservers.com/formatter.xml

Sorry for the potential double post, but I don't think I have yet to have something actually make it to pig-dev.

Enjoy!
Earl