adriano,

adriano wrote
> I am not referring to regular PDF documents, but intentionally altered
> ones made up by the bad guys in order to try and cause problems to some
> application. I am aware that a PDF document may have more than one
> /Catalog if it has been revised. 
> So what I was asking about is how to find suspicious objects in a PDF,
> like e.g. seemingly unused or duplicated ones of certain types (like a
> double /Catalog when the document has no revisions) ....

I think you should beforehand clarify which expectations concerning incoming
PDFs you have. Without that being done, there cannot be a concept of
suspicious objects as there may be anything in documents in the wild.

E.g. your PDFs might as a part of some use case be generated or finally
manipulated by only one program. In that case that program may do its job
always in a certain way which can be recognized in the resulting PDFs. In
this case your expectations would be that such patterns can be recognized.
(These patterns do depend on the very program, though!)

As soon as that's done, you should define your term "suspicious objects"
more clearly than your "e.g. ... like ...".

If e.g. --- as speculated above --- you can expect the PDFs to expose
certain pattern, suspicious objects would be those which break those
patterns.

But you have to be aware that such analysis requires a fairly homogenous
document source (or at least a small collection of such sources).


Alternatively, as you constantly mention manipulated PDFs, you might already
have been receiving such bogus documents. If you have multiple such
manipulated PDFs, you can analyze them and try to find manipulation patterns
in them. These patterns should definitively stand out from the multitude of
correct input documents, though, otherwise you'll get too many false
suspects.


If you really are trying to harden some process in which there is a good
likelyhood of such manipulations in transport, you IMO should consider
introducing electronic signatures (in a broad sense; i.e. as long as it is
secure, anything goes, it does not necessarily have to involve legally
backed qualified signatures) and reject any input without signature or with
broken signatures.

Regards,   Michael

PS: In general multiple objects of a type a document needs only one of, are
not suspicious, even if there aren't multiple visible revisions. Some
programs change PDFs by inserting new and changed objects before the cross
reference table and updating that table to now represent the new and changed
objects. Thus, no visible revisions but still those objects you consider
suspicious...



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Duplicate-indirect-objects-tp4657759p4657797.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to