Hi Ludo, On Fri, 07 Jul 2017 14:02:04 +0200 [email protected] (Ludovic Courtès) wrote:
> Also, do you know whether the PDF specs are OK with that? Yeah, at the upstream bug link <https://bugs.ghostscript.com/show_bug.cgi?id=698208> we discussed that (somewhat). While they don't want to carry the patches (because they don't want to lose functionality) they explained that it might well be that *future* versions of the spec could make ID and UUID mandatory. Right now there's a stringent spec, called PDF/A (for "archiving"; which is intended for governing bodies where you don't want existing documents that dynamically alter their contents after some time - like with Javascript or something) which already sets the instance UUID to "". So I just set it to "" always rather than just for PDF/A. Also, as far as I understand the "/ID" is currently only mandatory when encrypting, although in the future it might change. That leaves the document UUID - and upstream, in some of the other bugreports, explained that they want UNIQUE document UUIDs. So I figured that we should just leave it off - so it's not the same over multiple documents. They are definitely not fine with non-unique UUIDs. This RDF metadata stuff (the instance UUID and document UUID) is quite new. In a former life I wrote PDF parsers and I didn't handle the RDF back then at all. So I guess it would even work to leave the entire RDF metadata off - after all, it worked back then. If someone is well-versed in XMP RDF metadata for PDF, I wonder what is better: leaving the entire RDF off or just leaving the element containing the document id (as an attribute) off. Currently, the patch does the latter. The specification by adobe (XMP Specification Part 1, ISO 16684-1:2011(E) Annex A) says "The use of robust GUIDs is encouraged; having globally unique values is important" but as far as I can see doesn't say whether they are mandatory. I also thought of patching groff instead. But it seems that groff is now searching for a maintainer - I'm not sure anyone would integrate it there. Also, I'm not well-versed in perl. Also, patching finished PDFs (using regexps or something) is kinda dangerous because nobody *forces* you to encode the streams (think: attachements) in PDFs. So it could be that some other non-PDF thing is integrated into the PDF as a stream and the regexp substituter would just substitute it in there as well. There's a program "pdfmark" which is supposed to be for changing the metadata for PDFs but upstream said that it can't change those fields. It could change the CreationDate, ModDate etc. In short, I think the lowest risk is patching ghostscript as we did here.
