Hi Yegor,
I've been digging deeper into the dependencies in maven. I think
that "lite" should become the usual way to build.
(1) maven/poi-ooxml.pom is missing two dependencies:
xmlbeans-2.3.0.jar
<property name="ooxml.xmlbeans.jar" location="${ooxml.lib}/
xmlbeans-2.3.0.jar"/>
<property name="ooxml.xmlbeans.url" value="${repository.m2}/maven2/
org/apache/xmlbeans/xmlbeans/2.3.0/xmlbeans-2.3.0.jar"/>
it is a chained dependency, xmlbeans-2.3.0.jar is included in ooxml-
schemas.pom.
geronimo-stax-api_1.0_spec-1.0.jar
<property name="ooxml.jsr173.jar" location="${ooxml.lib}/geronimo-
stax-api_1.0_spec-1.0.jar"/>
<property name="ooxml.jsr173.url" value="${repository.m2}/maven2/
org/apache/geronimo/specs/geronimo-stax-api_1.0_spec/1.0/geronimo-
stax-api_1.0_spec-1.0.jar"/>
> Is there a reason we let these out of the pom?
we need geronimo-stax only to build ooxml-schemas.jar, it is not
needed at runtime. Maven users don't need it either.
What build targets require jsr 173? Those will download geronimo-stax
and the schemas to build ooxml-schemas.
I propose to include ooxml-schemas-lite in the release cycle. The
artifact name is ooxml-schemas-lite-${version.id}.jar.
Interested projects (first of all I mean Apache Tika) can setup
their Maven poms to use <artifactId>poi-ooxml-lite</artifactId>
instead of <artifactId>poi-ooxml</artifactId>. This will reduce
the distribution size by approximately 10 MB.
(2) You propose a new artifact-id of ooxml-schemas-lite. I think a
name like ooxml-poi, poi-ooxml-schemas, or poi-opc would be better.
There are a few points to make here:
- ooxml-schemas has a different versioning - it is version 1.0. It
should not change much. We should have a documented build target
for this.
- ooxml lite - should follow the poi versioning schema since newer
versions of POI will cover more of the schema. So, it is not really
quite a sub of ooxml-schema as much as it is a cross reference
between ooxml-schema and poi-ooxml.
Which version should poi-ooxml use "lite" or ooxml-schemas? I think
we should always use "lite" and distribute lite. We can put the
"lite" classes in one of two places:
My plan was to distribute 'lite' only as a supplemental jar. Also, I
will consider switching the project to use "lite" only when I have
feedback from users.
OK. Then it will be made by a new build target. What is that target?
(a) In the poi-ooxml jar as part of that build.
(b) In its own jar under a new maven artifact-id. I like ooxml-poi
I think (b) is better, but if a user is working on ooxml support in
poi-ooxml then they it is likely that they will be covering parts
of the schema not yet covered by "lite"
Users will still want to work with the full schemas they need to
make a choice when they build - either with a special target or by
copying the big jar in ooxml-lib/
Development builds should always use the "full" jar and /ooxml-lib
should contain ooxml-schemas-1.0.jar and no other versions of ooxml-
schemas. Actually it is the only way it can work - the "lite" jar is
a derivative from the "full" jar, it is not an alternative.
OK. Lite jar is acceptable as a supplemental and I will "document" why
it should be used.
Is there a reason for maven users to care about the "lite". If there
is then we need to worry about alternative dependencies for ooxml -
schemas. If not then we are good.
In general users will want to use the "lite" jar. We can provide
access to the full ooxml-schema as a replacement. Is it possible to
have "selective" targets in a maven pom? Can we make poi-ooxml
dependent on either "ooxml-poi" or "ooxml-schema"?
For the build I think that an explicit target should be used called
"ooxml" - this will perform your full task and make sure that the
build environment is using "lite" and not "full". I suspect that
this target may move some files around. We'll need to explain that
adding support for parts of the schema means adding unit tests.
These unit test should help us with documentation on the OOXML
formats.
(2) I am reworking the home page. There is a table of components
that appear there.
Document -- Component -- JAR -- Maven artifactId
OLE2 Filesystem -- POIFS -- poi-version-yyyymmdd.jar -- poi
OLE2 Property Sets -- HPSF -- poi-version-yyyymmdd.jar -- poi
Excel XLS -- HSSF -- poi-version-yyyymmdd.jar -- poi
Excel XLSX -- XSSF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
PowerPoint PPT -- HSLF -- poi-scratchpad-version-yyyymmdd.jar --
poi-scratchpad
PowerPoint PPTX -- XSLF -- poi-ooxml-version-yyyymmdd.jar -- poi-
ooxml
Word DOC -- HWPF -- poi-scratchpad-version-yyyymmdd.jar -- poi-
scratchpad
Word DOCX -- XWPF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
Visio VSD -- HDGF -- poi-scratchpad-version-yyyymmdd.jar -- poi-
scratchpad
Publisher PUB -- HPBF -- poi-scratchpad-version-yyyymmdd.jar --
poi-scratchpad
Outlook MSG -- HSMF -- poi-scratchpad-version-yyyymmdd.jar -- poi-
scratchpad
I am missing the OOXML schemas in my list. With this new lite
version I need two rows.
OOXML Schemas -- OpenXML4J -- ooxml-schemas-yyyymmdd.jar -- poi-
ooxml
OOXML Lite -- OpenXML4J -- ooxml-schemas-lite-yyyymmdd.jar -- poi-
ooxml-lite
We will need to include poi-ooxml-version-yyyymmdd.jar in the poi-
ooxml-lite target as well. I'll mark the XLSX, XWPF, and XSLF rows
appropriately.
Correct?
Not quite.
OpenXML4J is a general-purpose API to work with OPC packages, it is
a direct counterpart of POIFS. So, it should stay separate.
And that is part of poi-ooxml. Correct?
As to OOXML Schemas, I would rather not advertise them on the web
site - it is a detail of our internal implementation. Users are
advised to use common interfaces.
I will document them on the website because they are dependencies of
the maven poms.
I will say that one would not normally want to build the ooxml-
schemas. However I could see people wanting to understand what
changing the schema might mean.
However with a lite version I can see a developer of new ooxml
features wanting to test their coverage and write unit tests.
Regards,
Dave
Yegor
(3) I 'll rewrite your description as a new page within the
currently very sparse. OOXML documentation.
BTW - the www.openxml4j.org domain has gone away and I am going to
need help from you in deciding additional documentation and OPC
examples that we should include for the OOXML sub-project.
Regards,
Dave
On Nov 16, 2009, at 8:53 AM, Yegor Kozlov wrote:
Hi All,
As we discussed at Apachecon, one way to optimize the size of POI
distributions is to create a 'lite' version of the ooxml-schemas
jar.
The idea is simple: remove all unused classes and resources from
the jar generated by XMLBeans. Rough estimations made at the
Barcamp showed that POI uses less than 30% of the OOXML schemas,
hence the optimized jar should be significantly smaller.
With this in mind I created a simple utility called OOXMLLite,
see http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java
The process includes four simple steps:
- run all ooxml unit tests
- see what classes from the ooxml-schemas.jar are loaded in the JVM
- copy the loaded classes into some directory.
- copy the binary resources (.xsb)
A good acceptance test is to run the ooxml unit tests against the
'lite' classes - all should pass. There is an accompanying Ant
task ooxml-xsds-lite for that, see build.xml.
The resulting 'lite' jar is much smaller: ooxml-schemas-lite-3.6-
beta1.jar is only 3.5 MB while the 'big' ooxml-schemas-1.0.jar is
14.5 MB. In theory, the size can be trimmed down below 3 MB - my
utility copies all .xsb files and does not yet track resource
dependencies.
I propose to include ooxml-schemas-lite in the release cycle. The
artifact name is ooxml-schemas-lite-${version.id}.jar.
Interested projects (first of all I mean Apache Tika) can setup
their Maven poms to use <artifactId>poi-ooxml-lite</artifactId>
instead of <artifactId>poi-ooxml</artifactId>. This will reduce
the distribution size by approximately 10 MB.
Yegor
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]