bug#27563: [PATCH v3 2/2] gnu: ghostscript: Write document ID only when encrypting.

Danny Milosavljevic Fri, 07 Jul 2017 06:36:45 -0700

Hi Ludo,

On Fri, 07 Jul 2017 14:02:04 +0200
[email protected] (Ludovic Courtès) wrote:


> Also, do you know whether the PDF specs are OK with that?  

Yeah, at the upstream bug link 
<https://bugs.ghostscript.com/show_bug.cgi?id=698208> we discussed that 
(somewhat).  While they don't want to carry the patches (because they don't 
want to lose functionality) they explained that it might well be that *future* 
versions of the spec could make ID and UUID mandatory.

Right now there's a stringent spec, called PDF/A (for "archiving"; which is 
intended for governing bodies where you don't want existing documents that 
dynamically alter their contents after some time - like with Javascript or 
something) which already sets the instance UUID to "".  So I just set it to "" 
always rather than just for PDF/A.

Also, as far as I understand the "/ID" is currently only mandatory when 
encrypting, although in the future it might change.

That leaves the document UUID - and upstream, in some of the other bugreports, 
explained that they want UNIQUE document UUIDs.  So I figured that we should 
just leave it off - so it's not the same over multiple documents.  They are 
definitely not fine with non-unique UUIDs.

This RDF metadata stuff (the instance UUID and document UUID) is quite new.  In 
a former life I wrote PDF parsers and I didn't handle the RDF back then at all. 
 So I guess it would even work to leave the entire RDF metadata off - after 
all, it worked back then.

If someone is well-versed in XMP RDF metadata for PDF, I wonder what is better: 
leaving the entire RDF off or just leaving the element containing the document 
id (as an attribute) off.  Currently, the patch does the latter.  The 
specification by adobe (XMP Specification Part 1, ISO 16684-1:2011(E) Annex A) 
says "The use of robust GUIDs is encouraged; having globally unique values is 
important" but as far as I can see doesn't say whether they are mandatory.

I also thought of patching groff instead.  But it seems that groff is now 
searching for a maintainer - I'm not sure anyone would integrate it there.  
Also, I'm not well-versed in perl.  Also, patching finished PDFs (using regexps 
or something) is kinda dangerous because nobody *forces* you to encode the 
streams (think: attachements) in PDFs.  So it could be that some other non-PDF 
thing is integrated into the PDF as a stream and the regexp substituter would 
just substitute it in there as well.

There's a program "pdfmark" which is supposed to be for changing the metadata 
for PDFs but upstream said that it can't change those fields.  It could change 
the CreationDate, ModDate etc.

In short, I think the lowest risk is patching ghostscript as we did here.

bug#27563: [PATCH v3 2/2] gnu: ghostscript: Write document ID only when encrypting.

Reply via email to