Hello POI devs!
The current patch I'm working on is about how pictures are handled
within the XWPF classes. Right now, if one wants to add an image, (s)he
has to take care to relate the image data with the according
PackagePart, using the methods in XWPFDocument with an input stream or a
file path as argument, before the appropriate XML can be added, containg
the OPC pacakge relationship id that was created before.
With the current implementations, there are some flaws / disadvantages:
1. The image is always added to the package as a new part. But Office is
capable to reuse an image, if it is used multiple times (e.g. in
different kinds of headers, footers, as watermark, etc.). I implemented
a checksum based comparison, if an image is added. As the XWPFDocument
represents both the document.xml main part, and the package as well, it
therefore got responsiblity to handle package-wide tasks, like managing
all package-known images.
2. When starting to reuse parts, it becomes essential to not remove
parts which are still in use. I added a usage-counter to
POIXMLDocumentPart, which will delete a DocumentPart (and the according
PackagePart along) only if no more relations point to the specific part.
Currently, if one creates in Word a new document, adds a picture to both
header and main document, then removes the relation from the document to
that image, then the header is broken as the image was removed.
In Powerpoint this becomes essential, as SlideLayout, SlideMaster and
Slide parts are related with each other. Removing the SLideLayout
relation from a Slide would remove the SlideLayout part completely,
regardless whether it is still in use from another 40 slides or not.
An alternative to using the relation-counter would be to make package
relationships bidirectional. When a package is loaded, every PackagePart
has both outgoing relation (already available) and by creating these, it
would add an entry to "incoming relations" in the relations target part
(this would be new). My decision was to use the simple relation counter,
as it is easier to handle than manually creating code for bidirectional
handling. When such code comes to play, I prefer to generate the stuff
with frameworks like EMF, before I start to reinvent the wheel. With
proper generator settings, EMF code can be used without Eclipse or even
without EMF at all, afaik.
3. The new code, added with revision 1092755 to XWPFRun, sets the
drawing and picture IDs to the position of the image in the
XWPFDocuments picture-list. This might be correct, but needs not to.
Looking into the OOXML specification, this identifier is specified as
unique across the whole DOCX. At least, thats how Word creates it. If
all identifiers are set to "0", the DOCX-file will still open properly
with Word, although I did not test if other functionalities are broken
as Word expects the id to be unique.
To avoid any trouble, I added an "IdentifierManager" class, where IDs
can be registered, reserved or removed, if the XML is loaded or
modified. XWPFDocument has an instance of the IdentifierManager to
manage all drawing Ids. This allows to handle all DrawingML objects Ids
in a package-wide centralized place.
4. As a POI API "user", I would like to add a picture to the XWPFRun. I
give the file, I choose the targte run, I give the inline / anchor
position where I expect the image to pop up. POI may take care for the
rest. I do not want to create parts, relate parts and create XML
manually. Then I can stick to the XMLBeans classes in the first place.
Issue #4 is still missing to most extent, including unit testing, but
I'm working on it. Add the CTPicture to XML, add an XWPFPicture to the
XWPFRun, take care to create a new PackagePart for the Image, relate the
XWPFPictureData with the according Part where the XWPFRun belongs to,
etc. In order to allow this, I want to add "RunElements" as generic
elements that may appear beneath a CTR. So, loading a CTR as XWPFRun
allows you to traverse its content, like an "XWPFText", break, etc. And
it enables you to add / insert specific content into this run, using POI
objects from another document or new POI objects instead of using
XmlCursor and the generic CT classes.
Are there any concerns, proposals or hints with this approach?
Kind regards,
Stefan Stern
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]