After a few hours with vbindiff I was able to figure it out.

For the benefit of others:
There are three reasons two consecutive creations of the "same" PDF will
have a diff.
1) CreatDate/ModDate - solved by removing.
2) BaseFont names have a randomly generated prefix - solved by using the
same seed each time for the random generation.
3) The PDF's trailer includes a randomly generated String as its file ID -
solved by changing the appropriate method in PdfEncryption to always hash
the empty string.

The resulting patch is here:
http://github.com/ymasory/Flashup/blob/master/lib/nodiff-patch-r4594

It should not be used for documents using crypto.

Best,
Yuvi

-- 
Yuvi Masory
University of Pennsylvania


On Wed, Sep 29, 2010 at 3:16 AM, Yuvi Masory <[email protected]> wrote:

> Hi all,
>
> I'm trying to patch iText so two consecutively created pdf files can have
> zero diff.
> I've removed the CreationDate and ModDate fields, but there's still a diff.
> What else would distinguish two pdfs, other than the CreationDate/ModDate?
>
> Here's my patch: <
> http://github.com/ymasory/Flashup/blob/master/lib/nodiff-patch-r4594>
>
> (In case you're wondering, I want zero diff to greatly my nightly build and
> and snapshot backup system for my pdfs.)
>
> Thank you for any suggestions!
>
> Best,
> Yuvi
>
> --
> Yuvi Masory
> University of Pennsylvania
>
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to