Hi Maruan, When we developed preflight and xmpbox we did not take any decision on that point. We could have considered it valid ... it is a matter of luck.
Changing that behavior will not be difficult. IMO, we should wait that someone is really sure before changing something. Can you create an issue to not forget this point ? KR, Guillaume On Fri, Aug 2, 2013 at 2:52 PM, Maruan Sahyoun <[email protected]>wrote: > being a member of the PDFAssociation (pdfa.org) there was a discussion > about some edge cases in xml that we interpret differently when doing PDF/A > validation than Acrobat and bro which I'm allowed to share > > <snip> > In this case we have a PDF with an XMP metadata stream containing two > <rdf:RDF> entries, one with rdf:about set to a blank string, the other with > it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2) > simply says that the stream must conform to the "XMP specification 2004 > revision" which reads (p21): > > The rdf:about attribute on the rdf:Description element is a required > attribute that identifies the resource whose metadata this XMP describes. > The value of this attribute must follow URI syntax and may be either: > > ● an empty string (as in the example above), which means that the XMP is > physically local to the resource being described. Applications must rely on > knowledge of the file format to correctly associate the XMP with the > resource. > > ● a unique instance ID that is generated every time a file is saved. The > next section gives guidelines for creating instance IDs. > > The XMP packet must describe a single entity, and my reading of the above > is a combination of empty-string and a unique UUID can meet this > requirement - this is how both our software and Acrobat X and XI behave. > However it's ambiguous, and this clause was revised in the 2012 revision > (ISO 16684-1:2011(E) para 7.4) to this: > > If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI > shall be the value of an rdf:about attribute in each top-level > rdf:Description element. Otherwise, the rdf:about attributes for all top- > level rdf:Description elements shall be present with an empty value. The > rdf:about attribute shall not be used in more deeply nested rdf:Description > elements. > For compatibility with very early XMP usage, it is recommended that XMP > readers tolerate a missing rdf:about attribute and treat it as present with > an empty value. It is also recommended that XMP readers tolerate a mix of > empty and non-empty rdf:about values, as long as all non-empty values are > identical. > > Which means that an empty string and a unique UUID are technically > incorrect, but it's recommended they be tolerated for compatibility > purposes. > > I concede this is a very fine hair to split, but if you're writing > software to validate or create PDF/A you have to make a decision on way or > another. BFO and Acrobat X and XI think this is valid, PDFBox and > pdf-tools.com online validator lean the other and classify this document > as invalid. The end result is a document which might be PDF/A compliant, > but no-one is really sure (and if anyone can give me a definitive answer > please do - email me off-list if you want a copy of the document, I will > need to get permission from my customer to forward it). > </snip> > > I can also share a sample file if one is interested working on that. > > BR > > > Maruan Sahyoun > >
