Willie Alberty wrote:
Although you can open a PDF file in a text editor and more or less follow its structure, it is not a text file. PDF documents are binary files. You can irreparably damage a PDF by doing string replacement operations.

The reason for this is the document trailer which appears at the end of every PDF file. This is an array of byte offsets to the various objects contained within the document. If you do a string replacement that changes the byte length of the string, you've wrecked this offsets table, and the PDF viewer will be unable to read the document.

If you're very careful to maintain the byte length of the strings you're replacing, you can actually change existing PDF content in this way, but you're treading on thin ice. If you keep the byte offsets in document trailer updated, you can change string lengths too, but this gets to be rather difficult.

thanks for that thorough explanation and thanks for the fop-link, thomas. will take a look at it.

thanks,
kai

Reply via email to