pig-user  

Re: RegExLoader, MyRegExLoader, CommonLogLoader, formatting file

Earl Cahill
Mon, 06 Oct 2008 22:50:57 -0700

Alan,

Thanks.  As you likely saw from a small deluge of JIRA emails, I created three 
issues, 472, 473, 474, where the latter two depend on the first.  Thought that 
might make ingestion a little easier.

Any thoughts on the eclipse formatting file?

Thanks,
Earl


----- Original Message ----
From: Alan Gates <[EMAIL PROTECTED]>
To: pig-user@incubator.apache.org; Earl Cahill <[EMAIL PROTECTED]>
Sent: Monday, October 6, 2008 3:23:54 PM
Subject: Re: RegExLoader, MyRegExLoader, CommonLogLoader, formatting file

Earl,

Thanks for the contributions.  The way to do this is open a JIRA at 
http://issues.apache.org/jira/browse/PIG and attach your patch to that.  
We can then download the patch, review it, test it, etc.

Alan.

Earl Cahill wrote:
> Howdy,
>
> I was well on my way to writing a class that would load logs built 
> from apache's common log format, as has been discussed on this list.  
> Then it hit me that really, I was just loading based on a regex, so I 
> rearchitected to the following classes
>
> RegExLoader - implements ReversibleLoadStoreFunc and does the heavy 
> lifting of loading.  Is an abstract class, with just one abstract 
> method getPattern() which returns the pattern you would like to load 
> with.  On a successful match, it walks through the find groups and 
> adds each group to a tuple list.
>
> CommonLogLoader - implements RegExLoader with a regex for apache's 
> common log format
>
> MyRegExLoader - implements RegExLoader and allows for arbitrary 
> regexes from pig latin like
>
> A = LOAD 'file:test.txt' USING 
> org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)');
>
> I added unit tests and some simple java docs, but didn't want to go 
> too far till I got a little feedback.  I attached a patch file, and 
> hope I did it right.
>
> I also attached a eclipse formatter file.  A former co-worker went 
> through sun's standards and several others where sun's standard was 
> silent and made an eclipse formatter file.  My team has been using it 
> and I like the idea.  It makes our code look rather similar and there 
> are many style decisions to make besides four space indent.  Anyway, 
> even if all you pig folks don't like this particular file, I 
> definitely suggest having one for the project.
>
> In case the files don't make it, I also put them here
>
> http://pig.holaservers.com/patch.txt
> http://pig.holaservers.com/formatter.xml
>
> Sorry for the potential double post, but I don't think I have yet to 
> have something actually make it to pig-dev.
>
> Enjoy!
> Earl
>