Finally it settled in my head. I finished improving build.xml - all seems to be
working OK.
I named the new artifact poi-ooxml-schemas, the prefix poi- clearly indicates that it is a POI derivative from
ooxml-schemas.jar.
I also think it is a reasonable idea to include poi-ooxml-schemas in the dist target and be the default provider of
ooxml xml beans. This means that the "big" ooxml-schemas-1.0.jar is used only for development. Normal POI releases as
well as Maven POMs will use the "lite" jar.
Below is current structure of a POI release bundle:
#lib is unchanged
lib/commons-logging-1.1.jar
lib/junit-3.8.1.jar
lib/log4j-1.2.13.jar
#ooxml-schemas-1.0.jar is excluded
ooxml-lib/dom4j-1.6.1.jar
ooxml-lib/geronimo-stax-api_1.0_spec-1.0.jar
ooxml-lib/xmlbeans-2.3.0.jar
poi-3.6-beta1-20091124.jar
poi-scratchpad-3.6-beta1-20091124.jar
poi-contrib-3.6-beta1-20091124.jar
poi-ooxml-3.6-beta1-20091124.jar
poi-ooxml-schemas-3.6-beta1-20091124.jar #new artifact, replaces
ooxml-schemas-1.0.jar
poi-examples-3.6-beta1-20091124.jar #new artifact, was requested in
Bugzilla
For Maven this change is transparent - POM for the poi-ooxml module depends on poi-ooxml-schemas instead of
ooxml-schemas, this means Maven users will only need to update the version of POI from 3.5-FINAL to 3.6, the rest will
be handled by Maven automatically.
Yegor
Hi Yegor,
+1
This will have affects on the website re-write.
(1) The "How to Build" page has a list of common targets. Here is what I
have currently:
clean -- Erase all build work products (ie. everything in the build
directory
compile -- Compiles all files from main, contrib and scratchpad
test -- Run all unit tests from main, contrib and scratchpad (JUnit)
jar -- Produce jar files
docs -- Generate all documentation for the system (Apache Forrest)
dist -- Create a distribution (JUnit and Apache Forrest)
This should always be part of the dist target. Should we add a target
for building a "lite" ooxml, or is this always be part of jar and test?
I think we should have a "lite" target separate from jar and test.
(2) I am reworking the home page. There is a table of components that
appear there.
Document -- Component -- JAR -- Maven artifactId
OLE2 Filesystem -- POIFS -- poi-version-yyyymmdd.jar -- poi
OLE2 Property Sets -- HPSF -- poi-version-yyyymmdd.jar -- poi
Excel XLS -- HSSF -- poi-version-yyyymmdd.jar -- poi
Excel XLSX -- XSSF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
PowerPoint PPT -- HSLF -- poi-scratchpad-version-yyyymmdd.jar --
poi-scratchpad
PowerPoint PPTX -- XSLF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
Word DOC -- HWPF -- poi-scratchpad-version-yyyymmdd.jar -- poi-scratchpad
Word DOCX -- XWPF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
Visio VSD -- HDGF -- poi-scratchpad-version-yyyymmdd.jar -- poi-scratchpad
Publisher PUB -- HPBF -- poi-scratchpad-version-yyyymmdd.jar --
poi-scratchpad
Outlook MSG -- HSMF -- poi-scratchpad-version-yyyymmdd.jar --
poi-scratchpad
I am missing the OOXML schemas in my list. With this new lite version I
need two rows.
OOXML Schemas -- OpenXML4J -- ooxml-schemas-yyyymmdd.jar -- poi-ooxml
OOXML Lite -- OpenXML4J -- ooxml-schemas-lite-yyyymmdd.jar --
poi-ooxml-lite
We will need to include poi-ooxml-version-yyyymmdd.jar in the
poi-ooxml-lite target as well. I'll mark the XLSX, XWPF, and XSLF rows
appropriately.
Correct?
(3) I 'll rewrite your description as a new page within the currently
very sparse. OOXML documentation.
BTW - the www.openxml4j.org domain has gone away and I am going to need
help from you in deciding additional documentation and OPC examples that
we should include for the OOXML sub-project.
Regards,
Dave
On Nov 16, 2009, at 8:53 AM, Yegor Kozlov wrote:
Hi All,
As we discussed at Apachecon, one way to optimize the size of POI
distributions is to create a 'lite' version of the ooxml-schemas jar.
The idea is simple: remove all unused classes and resources from the
jar generated by XMLBeans. Rough estimations made at the Barcamp
showed that POI uses less than 30% of the OOXML schemas, hence the
optimized jar should be significantly smaller.
With this in mind I created a simple utility called OOXMLLite, see
http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java
The process includes four simple steps:
- run all ooxml unit tests
- see what classes from the ooxml-schemas.jar are loaded in the JVM
- copy the loaded classes into some directory.
- copy the binary resources (.xsb)
A good acceptance test is to run the ooxml unit tests against the
'lite' classes - all should pass. There is an accompanying Ant task
ooxml-xsds-lite for that, see build.xml.
The resulting 'lite' jar is much smaller:
ooxml-schemas-lite-3.6-beta1.jar is only 3.5 MB while the 'big'
ooxml-schemas-1.0.jar is 14.5 MB. In theory, the size can be trimmed
down below 3 MB - my utility copies all .xsb files and does not yet
track resource dependencies.
I propose to include ooxml-schemas-lite in the release cycle. The
artifact name is ooxml-schemas-lite-${version.id}.jar.
Interested projects (first of all I mean Apache Tika) can setup their
Maven poms to use <artifactId>poi-ooxml-lite</artifactId> instead of
<artifactId>poi-ooxml</artifactId>. This will reduce the distribution
size by approximately 10 MB.
Yegor
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]