This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document. The code of the parser is quick-and-dirty, I’m not recommending this as text book case for PDF parsers, but it gets the job done.

You can see the parser in action in this screencast.

The stats option display statistics of the objects found in the PDF document. Use this to identify PDF documents with unusual/unexpected objects, or to classify PDF documents. For example, I generated statistics for 2 malicious PDF files, and although they were very different in content and size, the statistics were identical, proving that they used the same attack vector and shared the same origin.

The search option searches for a string in indirect objects (not inside the stream of indirect objects). The search is not case-sensitive, and is susceptible to the obfuscation techniques I documented (as I’ve yet to encounter these obfuscation techniques in the wild, I decided no to resort to canonicalization).

filter option applies the filter(s) to the stream. For the moment, only FlateDecode is supported (e.g. zlib decompression).

The raw option makes pdf-parser output raw data (e.g. not the printable Python representation).

objects outputs the data of the indirect object which ID was specified. This ID is not version dependent. If more than one object have the same ID (disregarding the version), all these objects will be outputted.

reference allows you to select all objects referencing the specified indirect object. This ID is not version dependent.

type alows you to select all objects of a given type. The type is a Name and as such is case-sensitive and must start with a slash-character (/).

Download:

pdf-parser_V0_2_0.zip (https)

MD5: 973E57E5EA8706F92EB0D6BA46EE9EFD

SHA256: 637C95018653C406F0A3AF62E72D9BF396C4AC56A8189586EB59467BD364A7D6

make-pdf tools
make-pdf-_javascript_.py allows one to create a simple PDF document with embedded _javascript_ that will execute upon opening of the PDF document. It’s essentially glue-code for the mPDF.py module which contains a class with methods to create headers, indirect objects, stream objects, trailers and XREFs.

20081109-134003

If you execute it without options, it will generate a PDF document with _javascript_ to display a message box (calling app.alert).

To provide your own _javascript_, use option –_javascript_ for a script on the command line, or –_javascript_file for a script contained in a file.

Download:

make-pdf_V0_1_0.zip (https)

MD5: 7682A66DCD0C3AF1D4A2AFA30D44AA8C

SHA256: 7E92B7EE4A3EE2FCFCAF0AC1398381E4F649A6E7C899351721D78D37D6018AA0

3 Comments »

[...] PDF, Quickpost — Didier Stevens @ 11:57 Per request, a more detailed post on how I use my pdf-parser stats [...]

Pingback by Quickpost: Fingerprinting PDF Files « Didier Stevens — Saturday 1 November 2008 @ 11:57
I’d like to be able to view a scanned pdf file (with handwriting in some fields) and black
out boxes on the form whose fields contain info I don’t want published.

Can that kind of thing be automated in a batch so that I don’t even have to open the files ?

That would be cool …

Can you point me in the right direction ? I’m not looking for you to code, but sending me in the right direction for this would be useful, and it looks like you’re cognizant of this kind of information.

Comment by james — Friday 21 November 2008 @ 15:03
@james

I’ve no experience with such tools, but you can start to look in the forum of PDF Planet.

Comment by Didier Stevens — Saturday 22 November 2008 @ 8:46

[linuxkernelnewbies] PDF Tools « Didier S tevens

PDF Tools

3 Comments »

Reply via email to