Hi All,
As we discussed at Apachecon, one way to optimize the size of POI distributions is to create a 'lite' version of the
ooxml-schemas jar.
The idea is simple: remove all unused classes and resources from the jar generated by XMLBeans. Rough estimations made
at the Barcamp showed that POI uses less than 30% of the OOXML schemas, hence the optimized jar should be significantly
smaller.
With this in mind I created a simple utility called OOXMLLite, see
http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java
The process includes four simple steps:
- run all ooxml unit tests
- see what classes from the ooxml-schemas.jar are loaded in the JVM
- copy the loaded classes into some directory.
- copy the binary resources (.xsb)
A good acceptance test is to run the ooxml unit tests against the 'lite' classes - all should pass. There is an
accompanying Ant task ooxml-xsds-lite for that, see build.xml.
The resulting 'lite' jar is much smaller: ooxml-schemas-lite-3.6-beta1.jar is only 3.5 MB while the 'big'
ooxml-schemas-1.0.jar is 14.5 MB. In theory, the size can be trimmed down below 3 MB - my utility copies all .xsb files
and does not yet track resource dependencies.
I propose to include ooxml-schemas-lite in the release cycle. The artifact name
is ooxml-schemas-lite-${version.id}.jar.
Interested projects (first of all I mean Apache Tika) can setup their Maven poms to use
<artifactId>poi-ooxml-lite</artifactId> instead of <artifactId>poi-ooxml</artifactId>. This will reduce the
distribution size by approximately 10 MB.
Yegor
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]