Earl Cahill
Mon, 06 Oct 2008 22:50:57 -0700
Alan, Thanks. As you likely saw from a small deluge of JIRA emails, I created three issues, 472, 473, 474, where the latter two depend on the first. Thought that might make ingestion a little easier. Any thoughts on the eclipse formatting file? Thanks, Earl ----- Original Message ---- From: Alan Gates <[EMAIL PROTECTED]> To: pig-user@incubator.apache.org; Earl Cahill <[EMAIL PROTECTED]> Sent: Monday, October 6, 2008 3:23:54 PM Subject: Re: RegExLoader, MyRegExLoader, CommonLogLoader, formatting file Earl, Thanks for the contributions. The way to do this is open a JIRA at http://issues.apache.org/jira/browse/PIG and attach your patch to that. We can then download the patch, review it, test it, etc. Alan. Earl Cahill wrote: > Howdy, > > I was well on my way to writing a class that would load logs built > from apache's common log format, as has been discussed on this list. > Then it hit me that really, I was just loading based on a regex, so I > rearchitected to the following classes > > RegExLoader - implements ReversibleLoadStoreFunc and does the heavy > lifting of loading. Is an abstract class, with just one abstract > method getPattern() which returns the pattern you would like to load > with. On a successful match, it walks through the find groups and > adds each group to a tuple list. > > CommonLogLoader - implements RegExLoader with a regex for apache's > common log format > > MyRegExLoader - implements RegExLoader and allows for arbitrary > regexes from pig latin like > > A = LOAD 'file:test.txt' USING > org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)'); > > I added unit tests and some simple java docs, but didn't want to go > too far till I got a little feedback. I attached a patch file, and > hope I did it right. > > I also attached a eclipse formatter file. A former co-worker went > through sun's standards and several others where sun's standard was > silent and made an eclipse formatter file. My team has been using it > and I like the idea. It makes our code look rather similar and there > are many style decisions to make besides four space indent. Anyway, > even if all you pig folks don't like this particular file, I > definitely suggest having one for the project. > > In case the files don't make it, I also put them here > > http://pig.holaservers.com/patch.txt > http://pig.holaservers.com/formatter.xml > > Sorry for the potential double post, but I don't think I have yet to > have something actually make it to pig-dev. > > Enjoy! > Earl >