Hi Folks
I've been doing some release engineering around Pig 0.7 and thought I
would share this in case any of you have it baked into a distribution.
Using the current techniques you can drop the current distro from 44MB to
a runtime only distro of 26MB. Also, if I've missed something or anything
I'm suggesting here has any negative ramifications I'd love to know.
1) Delete everything out of lib directory and copy the following files
into the lib directory commons-el.jar commons-httpclient-3.0.1.jar
commons-logging-1.0.4.jar hadoop-0.20.2-core.jar hbase-0.20.6.jar
hbase-0.20.6-test.jar jline-0.9.94.jar log4j-1.2.15.jar
2) Delete the Pig Jars in $PIG_HOME except pig-0.7.1-dev-core.jar and copy
it into the lib directory
3) Add the following to bin/pig so that grunt still works:
for f in $PIG_DIR/lib/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
Lastly, some observations
- According to its JIRA ticket, automaton.jar is part of Pig 0.8, what is
the jar doing in Pig 0.7?
- Those that ship Pig need to do Legal scans on the software to ensure all
the dependencies (jars in the lib folder) have friendly licenses and can
be shipped along with the base project. Creating files like Hadoop20.jar,
where Hadoop and all of its dependencies + a bunch of classes of
undetermined origin are all compiled into a single jar makes this
extremely difficult. I'd like to bring it up for consideration that in
future releases we could have an independent jar for each project in the
lib. Otherwise, for each class we have to figure out what the project is
(to determine its license) and what version it is based on the package
name and date of the classes.
Regards
Steve Watt