[
https://issues.apache.org/jira/browse/PDFBOX-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christian Appl updated PDFBOX-5263:
-----------------------------------
Attachment: PDFBOX-5263_Introduce_COSReferenceInfo.patch
> Suggestion: Signing actual document changes - Enhancing incremental saving
> --------------------------------------------------------------------------
>
> Key: PDFBOX-5263
> URL: https://issues.apache.org/jira/browse/PDFBOX-5263
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing, PDModel, Writing
> Affects Versions: 3.0.0 PDFBox
> Reporter: Christian Appl
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: Enhanced_incremental_saving_.patch,
> Enhanced_incremental_saving_PDFBox3.patch, NoSignatureFound.zip,
> Observer_based_incremental_saving(09-13-2021).patch,
> Observer_based_incremental_saving_(09-09-2021-09-09).patch,
> Observer_based_incremental_saving_(12-00-2021-09-08).patch,
> Observer_based_incremental_saving_(15-19-2021-09-08).patch,
> Observer_based_incremental_saving_(17-00-2021-09-07).patch,
> Observer_based_incremental_saving_(fixed_).patch,
> Observer_based_incremental_saving_-_still_eroneous.patch,
> Observer_based_incremental_saving_.patch,
> PDFBOX-5263-YTW2VWJQTDAE67PGJT6GS7QSKW3GNUQR_afterPatch.pdf,
> PDFBOX-5263-YTW2VWJQTDAE67PGJT6GS7QSKW3GNUQR_beforePatch.pdf,
> PDFBOX-5263_Introduce_COSReferenceInfo.patch,
> Prototype_COSChangeObserver_(code_base).patch,
> Prototype__Document_and_reference_holder_aware_COSContext_.patch,
> Updating_context_management_.patch, image-2021-08-23-14-55-24-077.png,
> image-2021-08-26-09-52-33-567.png, image-2021-08-26-09-54-24-897.png,
> image-2021-08-26-10-00-07-383.png, image-2021-08-26-10-02-08-003.png,
> image-2021-08-26-10-03-47-940.png, image-2021-08-26-10-06-42-925.png,
> image-2021-08-26-10-09-12-698.png, image-2021-08-26-10-12-19-265.png,
> image-2021-09-06-17-06-59-667.png, image-2021-09-07-09-35-31-408.png,
> image-2021-09-07-13-33-00-161.png, image-2021-09-07-15-40-59-080.png,
> image-2021-09-08-10-23-44-036.png, image-2021-09-08-11-18-34-211.png,
> image-2021-09-13-14-40-33-049.png, image-2021-09-13-14-41-13-206.png,
> out.pdf, out2.pdf
>
>
> *TL;DR:*
> Currently it is rather tedious to create incremental changes in between
> signatures via PDFBox. I attempted to simplify that and wrote a patch.
> This is rather a POC, than an actual suggestion for direct inclusion. (For
> reasons explained later.)
> *Signatures and incremental PDF documents:*
> A typical reason for wanting to sign a document multiple times (creating an
> incremental PDF) is , that in between signatures the document changed and the
> additional signature shall sign the new state of the document.
> If one wanted to implement such incremental changes using PDFBox, he would
> find, that most of the time made changes are completly ignored, when calling
> "saveIncremental".
> As documented for the "saveIncremental" methods and especially the matching
> constructors in "COSWriter", this would require, to identify the "path" of
> all made changes and one would need to set the "needToBeUpdated" flag of all
> elements of that path.
> *But:*
> As documented one would have to have exact understanding of what he did and
> how the PDF standard does implement this, he would have to identify said
> structures and the more complex the changes were, the more tedious this would
> become.
> *Also:*
> Because of the implementation of incremental saving in COSWriter, the whole
> path must be informed that it required an update.
> Resulting in unnecessary large increments, as not all ancestors might
> actually have changed.
> e.g. If one added an image to a preexisting page of the document - the
> contentstream, the resources of the page and the page dictionary would have
> changed. But the "pages" array and all it's ancestors would not have changed
> a bit, but still would have to be informed and included.
> *Assumptions that lead to this patch:*
> - COSWriter should not stop iterating a COSTree just because a parent element
> did not change. It's descendants still could have changed!
> - Externally managing an object´s update state is tedious and error-prone.
> Objects that implement "COSUpdateInfo" should know and manage by themselves
> whether they were freshly created or altered
> (e.g.: A COSDictionary should be able to remember, that a setter had been
> called).
> - If "COSUpdateInfo" objects were self aware and would solve this by
> themselves, it would not be necessary anymore to set update states manually.
> *Problems:*
> The first and obvious problem is, that the initial loading of a document is
> creating and altering new COS structures and we obviously don't want objects
> to observe and remember those changes. An object that is created during
> document initialization must be treated as preexisting.
> However: COSBase is not context aware - it does know it's descendants, but
> neither does it know it's parent, nor does it know it's root.
> If it was, that actually would present the optimal solution, as in that case
> the Object could ask it's root for the current load state and therefore would
> be able to ignore said changes caused by the initial loading of a document.
> But it is not. (My opinion is - it should be! But more on that later.)
> Therefore a a helper named COSUpdateInfoList was implemented, which was
> capable of finding COSUpdateInfo objects in a COS structure, and that allowed
> resetting their update state after loading was completed.
> *Description of the patch:*
> The patch implements selfaware COSUpdateInfo objects, which the COSWriter has
> been adapted to process. PDFBox therefore is capable of monitoring changes in
> realtime and to automatically include altered structures in an incremental
> save of the document, therefore creating increments (or an increment), that a
> signature would sign.
> *Result:*
> Using this patch documents could be created:
> incrementally adding pages, adding contents to pages, adding annotations,
> altering structures, removing structures.
> As far as has been initially tested the resulting documents were valid,
> viewable in a reader and the objects overwritten in increments seemed correct.
> *But -* *Caveat:*
> This patch does introduce atleast one ugly class (most likely you will be
> able to point out more, that could be optimized :)) and that is
> "COSUpdateInfoList" - as already explained: In my opinion such a class should
> not exist, the COSUpdateInfo objects should be context aware and should be
> capable of regulating their own behaviour.
> Whenever the alternatives are to either manage an object externally, or to
> "teach" an object to solve problems autonomously, I will tend to prefer the
> latter... but I did not dare to do that.
> This would require, that either further constructors or setters would have to
> be introduced for COSBase objects, that allowed setting parent/root/context
> for the object.
> Which would result in further massive changes for using applications and
> PDFBox itself - as all instantiations of COSBase objects (PDObjects) would
> have to be adapted.
> However: I would prefer if COSBase objects actually were context aware.
> But as stated... I did not dare to touch it and instead chose the ugly
> workarround, that would introduce yet another iteration over the whole
> COSDocument structure.
> Eliminating COSUpdateInfoList would be preferable!
> *Suggestion:*
> As PDFBox 3 is already changing how documents and objects are handled, I
> would suggest, that also COSBase objects should be made context and selfaware
> in PDFBox 3.
> This would allow simplifying handling COS objects using PDFBox and it would
> allow for an easier and automized handling of incremental saving.
> *Usage example:*
> The following "pseudo code" (actually using simplified Helper classes)
> demonstrates the intended usage:
> !image-2021-08-23-14-55-24-077.png!
> *As always:* Thank you very much for your work and support! I hope this
> suggestion is to your liking.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]