[ 
https://issues.apache.org/jira/browse/PDFBOX-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929457#comment-17929457
 ] 

Michael Klink commented on PDFBOX-5962:
---------------------------------------

{quote}My general expectation is that "stuff should just work".{quote}

Well, as the PDF is broken, _just working_ would mean the result is something 
probably even more broken. GIGO.

Yes, in the case at hand the repair is pretty obvious, collect the orphan 
fields and add them to the *AcroForm Fields*. And if you feel generous, you'll 
even try and merge them with existing fields of the same name.

But if you don't want to break even more in the process than you repair, you 
have to make sure that the existing *AcroForm* fields and the orphan fields 
with the same name match in their attributes. For example, if the types don't 
match, you'll have two incompatible field objects with the same name in your 
form which may result in the weirdest results in post-processors or viewers.

And if you generously merge the orphans into the *AcroForm* fields, even more 
can go wrong.

You can of course opt to only add those orphans to the *AcroForm Fields*  that 
cause no issues. But I think that will be even more difficult to understand for 
the users you originally wanted to help by the fixups.

To put it briefly, a _good_ fixup (that in general won't break more than it 
repairs) of this issue is much more complicated than it may look at first.



> Saving PDDocument with flattened form retains fields
> ----------------------------------------------------
>
>                 Key: PDFBOX-5962
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5962
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 3.0.4 PDFBox
>         Environment: Java 21
>            Reporter: Rune Flobakk
>            Priority: Major
>         Attachments: form-problem.pdf
>
>
> I believe I may have found a bug or at least a certain change in behavior 
> introduced in v3.0.4.
> For some PDAcroForms, after flattening the form, they seem to somehow retain 
> their fields when saving the PDDocument. {{PDAcroForm.getFields()}} is an 
> empty list after flattening (as I believe is expected), but when saving the 
> {{{}PDDocument{}}}, and re-reading the saved file {{PDAcroForm.getFields()}} 
> contains the fields of the form before it was flattened. Opening the saved 
> file in a PDF viewer also shows the form as editable.
> The flattening works as expected in v3.0.3, and the form becomes non-editable 
> with the values displayed as expected.
> I notice for this particular PDF I am testing with, there are a lot of 
> logging like this in v3.0.4 when invoking {{{}.flatten(){}}}:
> {code:java}
> WARN missing /P entry (page reference) in a widget for field: ... {code}
> So there are apparently some issues with the particular PDF, though it worked 
> as expected in v3.0.3. I see this logging was introduced here: 
> [https://github.com/apache/pdfbox/commit/e49649ae89c913058c1be79bec6b4f561fc1f0b6]
>  which is part of PDFBOX-5225. The {{.flatten()}} invocation succeeds in both 
> versions, but the flattening operation seem to not be effective in the saved 
> PDF file when using v3.0.4.
> I have made a small project demonstrating the problem here:
> [https://github.com/runeflobakk/pdfbox-flatten-form-save-issue]
> There is a {{FlattenFormTest}} JUnit test demonstrating the process for both 
> a problematic PDF and one which works as expected. Changing the pdfbox 
> dependency version to 3.0.3 makes both tests pass. The saved files appears in 
> the target directory for inspection.
> Thank you, and please let me know if there are any details I may have left 
> out!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to