[ https://issues.apache.org/jira/browse/PDFBOX-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930385#comment-17930385 ]
Rune Flobakk commented on PDFBOX-5962: -------------------------------------- I have managed to do some more work with this matter today, and especially revised my use of the PDFBox API. The discussion in this issue led my attention to the fact that {{getAcroForm()}} (with no arguments) actually may _mutate_ the loaded document. (To me, that goes against my expectations, but that is another discussion.) In my particular case, I believe the correct way to retrieve the form is by using {{{}getAcroForm(null){}}}, since I do not want any changes performed on the PDF unless I explicitly trigger those changes. There may be good reasons for {{getAcroForm()}} to implicitly "fix" breakage in the loaded PDF, I do not know the history of this, but I also think [~mkl] touches on a valid point on why such implicit mutations may be a slippery slope: {quote}a _good_ fixup (that in general won't break more than it repairs) of this issue is much more complicated than it may look at first. {quote} So to me, the changed behavior from 3.0.3 to 3.0.4 is first and foremost that invoking {{.flatten()}} on forms in documents produced by Quartz PDFContext will result in PDFs where the forms are {+}still editable{+}. I absolutely acknowledge that this happens with a PDF which already has some breakage and internal inconsistencies, but v3.0.3 demonstrated that it _is_ possible to flatten these forms, and produce a result which is to be expected at least by a "purely consumer" of the resulting PDF. I am crossing fingers for the possibility to fix this, and I am of course ready to do any testing if that can help out. > Saving PDDocument with flattened form retains fields > ---------------------------------------------------- > > Key: PDFBOX-5962 > URL: https://issues.apache.org/jira/browse/PDFBOX-5962 > Project: PDFBox > Issue Type: Bug > Components: AcroForm > Affects Versions: 3.0.4 PDFBox > Environment: Java 21 > Reporter: Rune Flobakk > Priority: Major > Attachments: form-problem.pdf > > > I believe I may have found a bug or at least a certain change in behavior > introduced in v3.0.4. > For some PDAcroForms, after flattening the form, they seem to somehow retain > their fields when saving the PDDocument. {{PDAcroForm.getFields()}} is an > empty list after flattening (as I believe is expected), but when saving the > {{{}PDDocument{}}}, and re-reading the saved file {{PDAcroForm.getFields()}} > contains the fields of the form before it was flattened. Opening the saved > file in a PDF viewer also shows the form as editable. > The flattening works as expected in v3.0.3, and the form becomes non-editable > with the values displayed as expected. > I notice for this particular PDF I am testing with, there are a lot of > logging like this in v3.0.4 when invoking {{{}.flatten(){}}}: > {code:java} > WARN missing /P entry (page reference) in a widget for field: ... {code} > So there are apparently some issues with the particular PDF, though it worked > as expected in v3.0.3. I see this logging was introduced here: > [https://github.com/apache/pdfbox/commit/e49649ae89c913058c1be79bec6b4f561fc1f0b6] > which is part of PDFBOX-5225. The {{.flatten()}} invocation succeeds in both > versions, but the flattening operation seem to not be effective in the saved > PDF file when using v3.0.4. > I have made a small project demonstrating the problem here: > [https://github.com/runeflobakk/pdfbox-flatten-form-save-issue] > There is a {{FlattenFormTest}} JUnit test demonstrating the process for both > a problematic PDF and one which works as expected. Changing the pdfbox > dependency version to 3.0.3 makes both tests pass. The saved files appears in > the target directory for inspection. > Thank you, and please let me know if there are any details I may have left > out! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org