Hello POI devs!

The current patch I'm working on is about how pictures are handled within the XWPF classes. Right now, if one wants to add an image, (s)he has to take care to relate the image data with the according PackagePart, using the methods in XWPFDocument with an input stream or a file path as argument, before the appropriate XML can be added, containg the OPC pacakge relationship id that was created before.

With the current implementations, there are some flaws / disadvantages:
1. The image is always added to the package as a new part. But Office is capable to reuse an image, if it is used multiple times (e.g. in different kinds of headers, footers, as watermark, etc.). I implemented a checksum based comparison, if an image is added. As the XWPFDocument represents both the document.xml main part, and the package as well, it therefore got responsiblity to handle package-wide tasks, like managing all package-known images.

2. When starting to reuse parts, it becomes essential to not remove parts which are still in use. I added a usage-counter to POIXMLDocumentPart, which will delete a DocumentPart (and the according PackagePart along) only if no more relations point to the specific part. Currently, if one creates in Word a new document, adds a picture to both header and main document, then removes the relation from the document to that image, then the header is broken as the image was removed. In Powerpoint this becomes essential, as SlideLayout, SlideMaster and Slide parts are related with each other. Removing the SLideLayout relation from a Slide would remove the SlideLayout part completely, regardless whether it is still in use from another 40 slides or not. An alternative to using the relation-counter would be to make package relationships bidirectional. When a package is loaded, every PackagePart has both outgoing relation (already available) and by creating these, it would add an entry to "incoming relations" in the relations target part (this would be new). My decision was to use the simple relation counter, as it is easier to handle than manually creating code for bidirectional handling. When such code comes to play, I prefer to generate the stuff with frameworks like EMF, before I start to reinvent the wheel. With proper generator settings, EMF code can be used without Eclipse or even without EMF at all, afaik.

3. The new code, added with revision 1092755 to XWPFRun, sets the drawing and picture IDs to the position of the image in the XWPFDocuments picture-list. This might be correct, but needs not to. Looking into the OOXML specification, this identifier is specified as unique across the whole DOCX. At least, thats how Word creates it. If all identifiers are set to "0", the DOCX-file will still open properly with Word, although I did not test if other functionalities are broken as Word expects the id to be unique. To avoid any trouble, I added an "IdentifierManager" class, where IDs can be registered, reserved or removed, if the XML is loaded or modified. XWPFDocument has an instance of the IdentifierManager to manage all drawing Ids. This allows to handle all DrawingML objects Ids in a package-wide centralized place.

4. As a POI API "user", I would like to add a picture to the XWPFRun. I give the file, I choose the targte run, I give the inline / anchor position where I expect the image to pop up. POI may take care for the rest. I do not want to create parts, relate parts and create XML manually. Then I can stick to the XMLBeans classes in the first place.

Issue #4 is still missing to most extent, including unit testing, but I'm working on it. Add the CTPicture to XML, add an XWPFPicture to the XWPFRun, take care to create a new PackagePart for the Image, relate the XWPFPictureData with the according Part where the XWPFRun belongs to, etc. In order to allow this, I want to add "RunElements" as generic elements that may appear beneath a CTR. So, loading a CTR as XWPFRun allows you to traverse its content, like an "XWPFText", break, etc. And it enables you to add / insert specific content into this run, using POI objects from another document or new POI objects instead of using XmlCursor and the generic CT classes.

Are there any concerns, proposals or hints with this approach?

Kind regards,
Stefan Stern

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to