Hi All,

As we discussed at Apachecon, one way to optimize the size of POI distributions is to create a 'lite' version of the ooxml-schemas jar. The idea is simple: remove all unused classes and resources from the jar generated by XMLBeans. Rough estimations made at the Barcamp showed that POI uses less than 30% of the OOXML schemas, hence the optimized jar should be significantly smaller.

With this in mind I created a simple utility called OOXMLLite, see http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java

The process includes four simple steps:

 - run all ooxml unit tests
 - see what classes from the ooxml-schemas.jar are loaded in the JVM
 - copy the loaded classes into some directory.
 - copy the binary resources (.xsb)

A good acceptance test is to run the ooxml unit tests against the 'lite' classes - all should pass. There is an accompanying Ant task ooxml-xsds-lite for that, see build.xml.

The resulting 'lite' jar is much smaller: ooxml-schemas-lite-3.6-beta1.jar is only 3.5 MB while the 'big' ooxml-schemas-1.0.jar is 14.5 MB. In theory, the size can be trimmed down below 3 MB - my utility copies all .xsb files and does not yet track resource dependencies.

I propose to include ooxml-schemas-lite in the release cycle. The artifact name 
is ooxml-schemas-lite-${version.id}.jar.
Interested projects (first of all I mean Apache Tika) can setup their Maven poms to use <artifactId>poi-ooxml-lite</artifactId> instead of <artifactId>poi-ooxml</artifactId>. This will reduce the distribution size by approximately 10 MB.

Yegor

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to