[ 
https://issues.apache.org/jira/browse/PDFBOX-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930385#comment-17930385
 ] 

Rune Flobakk commented on PDFBOX-5962:
--------------------------------------

I have managed to do some more work with this matter today, and especially 
revised my use of the PDFBox API.

The discussion in this issue led my attention to the fact that 
{{getAcroForm()}} (with no arguments) actually may _mutate_ the loaded 
document. (To me, that goes against my expectations, but that is another 
discussion.) In my particular case, I believe the correct way to retrieve the 
form is by using {{{}getAcroForm(null){}}}, since I do not want any changes 
performed on the PDF unless I explicitly trigger those changes. There may be 
good reasons for {{getAcroForm()}} to implicitly "fix" breakage in the loaded 
PDF, I do not know the history of this, but I also think [~mkl] touches on a 
valid point on why such implicit mutations may be a slippery slope:
{quote}a _good_ fixup (that in general won't break more than it repairs) of 
this issue is much more complicated than it may look at first.
{quote}
 

So to me, the changed behavior from 3.0.3 to 3.0.4 is first and foremost that 
invoking {{.flatten()}} on forms in documents produced by Quartz PDFContext 
will result in PDFs where the forms are {+}still editable{+}. I absolutely 
acknowledge that this happens with a PDF which already has some breakage and 
internal inconsistencies, but v3.0.3 demonstrated that it _is_ possible to 
flatten these forms, and produce a result which is to be expected at least by a 
"purely consumer" of the resulting PDF. I am crossing fingers for the 
possibility to fix this, and I am of course ready to do any testing if that can 
help out.

> Saving PDDocument with flattened form retains fields
> ----------------------------------------------------
>
>                 Key: PDFBOX-5962
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5962
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 3.0.4 PDFBox
>         Environment: Java 21
>            Reporter: Rune Flobakk
>            Priority: Major
>         Attachments: form-problem.pdf
>
>
> I believe I may have found a bug or at least a certain change in behavior 
> introduced in v3.0.4.
> For some PDAcroForms, after flattening the form, they seem to somehow retain 
> their fields when saving the PDDocument. {{PDAcroForm.getFields()}} is an 
> empty list after flattening (as I believe is expected), but when saving the 
> {{{}PDDocument{}}}, and re-reading the saved file {{PDAcroForm.getFields()}} 
> contains the fields of the form before it was flattened. Opening the saved 
> file in a PDF viewer also shows the form as editable.
> The flattening works as expected in v3.0.3, and the form becomes non-editable 
> with the values displayed as expected.
> I notice for this particular PDF I am testing with, there are a lot of 
> logging like this in v3.0.4 when invoking {{{}.flatten(){}}}:
> {code:java}
> WARN missing /P entry (page reference) in a widget for field: ... {code}
> So there are apparently some issues with the particular PDF, though it worked 
> as expected in v3.0.3. I see this logging was introduced here: 
> [https://github.com/apache/pdfbox/commit/e49649ae89c913058c1be79bec6b4f561fc1f0b6]
>  which is part of PDFBOX-5225. The {{.flatten()}} invocation succeeds in both 
> versions, but the flattening operation seem to not be effective in the saved 
> PDF file when using v3.0.4.
> I have made a small project demonstrating the problem here:
> [https://github.com/runeflobakk/pdfbox-flatten-form-save-issue]
> There is a {{FlattenFormTest}} JUnit test demonstrating the process for both 
> a problematic PDF and one which works as expected. Changing the pdfbox 
> dependency version to 3.0.3 makes both tests pass. The saved files appears in 
> the target directory for inspection.
> Thank you, and please let me know if there are any details I may have left 
> out!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to