being a member of the PDFAssociation (pdfa.org) there was a discussion about some edge cases in xml that we interpret differently when doing PDF/A validation than Acrobat and bro which I'm allowed to share
<snip> In this case we have a PDF with an XMP metadata stream containing two <rdf:RDF> entries, one with rdf:about set to a blank string, the other with it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2) simply says that the stream must conform to the "XMP specification 2004 revision" which reads (p21): The rdf:about attribute on the rdf:Description element is a required attribute that identifies the resource whose metadata this XMP describes. The value of this attribute must follow URI syntax and may be either: ● an empty string (as in the example above), which means that the XMP is physically local to the resource being described. Applications must rely on knowledge of the file format to correctly associate the XMP with the resource. ● a unique instance ID that is generated every time a file is saved. The next section gives guidelines for creating instance IDs. The XMP packet must describe a single entity, and my reading of the above is a combination of empty-string and a unique UUID can meet this requirement - this is how both our software and Acrobat X and XI behave. However it's ambiguous, and this clause was revised in the 2012 revision (ISO 16684-1:2011(E) para 7.4) to this: If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI shall be the value of an rdf:about attribute in each top-level rdf:Description element. Otherwise, the rdf:about attributes for all top- level rdf:Description elements shall be present with an empty value. The rdf:about attribute shall not be used in more deeply nested rdf:Description elements. For compatibility with very early XMP usage, it is recommended that XMP readers tolerate a missing rdf:about attribute and treat it as present with an empty value. It is also recommended that XMP readers tolerate a mix of empty and non-empty rdf:about values, as long as all non-empty values are identical. Which means that an empty string and a unique UUID are technically incorrect, but it's recommended they be tolerated for compatibility purposes. I concede this is a very fine hair to split, but if you're writing software to validate or create PDF/A you have to make a decision on way or another. BFO and Acrobat X and XI think this is valid, PDFBox and pdf-tools.com online validator lean the other and classify this document as invalid. The end result is a document which might be PDF/A compliant, but no-one is really sure (and if anyone can give me a definitive answer please do - email me off-list if you want a copy of the document, I will need to get permission from my customer to forward it). </snip> I can also share a sample file if one is interested working on that. BR Maruan Sahyoun
