[
https://issues.apache.org/jira/browse/PDFBOX-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207987#comment-17207987
]
Michael Klink commented on PDFBOX-4970:
---------------------------------------
It would indeed be nice to have the PDF Debugger extended to provide
information on changes between revisions and on unused material in the PDF. Or
even better, to have these information available via members of the
{{PDDocument}}.
Beware, though:
{quote}we would like to be able to detect that kind of prepared documents for
the given attack. We were thinking to check if any duplicate object id is
present in the document to be signed.
{quote}
Only because there is a duplicate object number in a revision, you cannot be
sure yet that the document is _prepared for a shadow attack_, it merely has the
_stink_ of such an attack.
I played around a bit myself, and my sniffer routine found a number of false
positives, in particular:
* Some PDF processors write data to the PDF output stream early to save
memory. If the objects in question are changed again later in the same run, a
second, updated copy of the object is simply appended to the stream to be later
referenced from the cross references instead of the earlier version. As here
there are two objects with similar but not identical contents in the same
revision, one could falsely assume an attack preparation.
* If the PDF in question contains an embedded PDF attachment, there quite
likely are numerous object numbers used both in the embedded and in the
embedding PDF. Embedding PDF attachments like that _can_ be a preparation for a
shadow attack but usually isn't one.
Similarly, sniffing for the other attack types also only finds _stinks_ of such
_preparations_ but not 100% sure indications. E.g. non-matching form field
values and display values also occur for other, dumb reasons, unrelated to
attacks.
Thus, you most likely won't _be able to detect that kind of prepared documents
for the given attack_, merely a _stink_ thereof.
Nonetheless, also detection of a mere stink can be interesting as an attacker
can probably exploit such accidental existing structures like an attack
preparation. The result might be subtle changes, e.g. a switch to a previous
revision of some paragraph in a contract which for good reasons has not been
signed in that original form.
> Possibility to detect duplicate ids in a revision
> -------------------------------------------------
>
> Key: PDFBOX-4970
> URL: https://issues.apache.org/jira/browse/PDFBOX-4970
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Pierrick Vandenbroucke
> Priority: Major
>
> We are trying to detect files which contain several objects with the same
> identifier within a revision or in a given PDF. Currently, that seems not
> possible. We are facing to this
> [map|https://github.com/apache/pdfbox/blob/2.0.21/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java#L56]
> which only allows one instance by object id. The map usage brings
> limitations (eg : rendering,...).
> Is that possible to detect such files ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]