+1 This wil greatly simplify (or rather say enable) the use of Pig from within other systems (like Oozie) as it will allow to do a proper component dependency resolution.
Thanks. Alejandro On Thu, Dec 9, 2010 at 3:37 AM, Stephen Watt <[email protected]> wrote: > Hi Folks > > I've been doing some release engineering around Pig 0.7 and thought I > would share this in case any of you have it baked into a distribution. > Using the current techniques you can drop the current distro from 44MB to > a runtime only distro of 26MB. Also, if I've missed something or anything > I'm suggesting here has any negative ramifications I'd love to know. > > 1) Delete everything out of lib directory and copy the following files > into the lib directory commons-el.jar commons-httpclient-3.0.1.jar > commons-logging-1.0.4.jar hadoop-0.20.2-core.jar hbase-0.20.6.jar > hbase-0.20.6-test.jar jline-0.9.94.jar log4j-1.2.15.jar > 2) Delete the Pig Jars in $PIG_HOME except pig-0.7.1-dev-core.jar and copy > it into the lib directory > 3) Add the following to bin/pig so that grunt still works: > > for f in $PIG_DIR/lib/*.jar; do > CLASSPATH=${CLASSPATH}:$f; > done > > Lastly, some observations > > - According to its JIRA ticket, automaton.jar is part of Pig 0.8, what is > the jar doing in Pig 0.7? > > - Those that ship Pig need to do Legal scans on the software to ensure all > the dependencies (jars in the lib folder) have friendly licenses and can > be shipped along with the base project. Creating files like Hadoop20.jar, > where Hadoop and all of its dependencies + a bunch of classes of > undetermined origin are all compiled into a single jar makes this > extremely difficult. I'd like to bring it up for consideration that in > future releases we could have an independent jar for each project in the > lib. Otherwise, for each class we have to figure out what the project is > (to determine its license) and what version it is based on the package > name and date of the classes. > > Regards > Steve Watt
