being a member of the PDFAssociation (pdfa.org) there was a discussion about 
some edge cases in xml that we interpret differently when doing PDF/A 
validation than Acrobat and bro which I'm allowed to share

<snip>
In this case we have a PDF with an XMP metadata stream containing two <rdf:RDF> 
entries, one with rdf:about set to a blank string, the other with it set to a 
UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2) simply says that 
the stream must conform to the "XMP specification 2004 revision" which reads 
(p21):

The rdf:about attribute on the rdf:Description element is a required attribute 
that identifies the resource whose metadata this XMP describes. The value of 
this attribute must follow URI syntax and may be either:

●  an empty string (as in the example above), which means that the XMP is 
physically local to the resource being described. Applications must rely on 
knowledge of the file format to correctly associate the XMP with the resource.

●  a unique instance ID that is generated every time a file is saved. The next 
section gives guidelines for creating instance IDs.

The XMP packet must describe a single entity, and my reading of the above is a 
combination of empty-string and a unique UUID can meet this requirement - this 
is how both our software and Acrobat X and XI behave. However it's ambiguous, 
and this clause was revised in the 2012 revision (ISO 16684-1:2011(E) para 7.4) 
to this:

If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI shall 
be the value of an rdf:about attribute in each top-level rdf:Description 
element. Otherwise, the rdf:about attributes for all top- level rdf:Description 
elements shall be present with an empty value. The rdf:about attribute shall 
not be used in more deeply nested rdf:Description elements.
For compatibility with very early XMP usage, it is recommended that XMP readers 
tolerate a missing rdf:about attribute and treat it as present with an empty 
value. It is also recommended that XMP readers tolerate a mix of empty and 
non-empty rdf:about values, as long as all non-empty values are identical.

Which means that an empty string and a unique UUID are technically incorrect, 
but it's recommended they be tolerated for compatibility purposes.

I concede this is a very fine hair to split, but if you're writing software to 
validate or create PDF/A you have to make a decision on way or another. BFO and 
Acrobat X and XI think this is valid, PDFBox and pdf-tools.com online validator 
lean the other and classify this document as invalid. The end result is a 
document which might be PDF/A compliant, but no-one is really sure (and if 
anyone can give me a definitive answer please do - email me off-list if you 
want a copy of the document, I will need to get permission from my customer to 
forward it).
</snip>

I can also share a sample file if one is interested working on that. 

BR


Maruan Sahyoun

Reply via email to