Hi Guillaume, let's wait for a settlement on that topic @ pdfa.org. I'll open a ticket with a description as soon as there are comments on it.
BR Maruan Sahyoun Am 02.08.2013 um 20:04 schrieb Guillaume Bailleul <[email protected]>: > Hi Maruan, > > When we developed preflight and xmpbox we did not take any decision on that > point. We could have considered it valid ... it is a matter of luck. > > Changing that behavior will not be difficult. > > IMO, we should wait that someone is really sure before changing something. > > Can you create an issue to not forget this point ? > > KR, > > Guillaume > > > > > On Fri, Aug 2, 2013 at 2:52 PM, Maruan Sahyoun <[email protected]>wrote: > >> being a member of the PDFAssociation (pdfa.org) there was a discussion >> about some edge cases in xml that we interpret differently when doing PDF/A >> validation than Acrobat and bro which I'm allowed to share >> >> <snip> >> In this case we have a PDF with an XMP metadata stream containing two >> <rdf:RDF> entries, one with rdf:about set to a blank string, the other with >> it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2) >> simply says that the stream must conform to the "XMP specification 2004 >> revision" which reads (p21): >> >> The rdf:about attribute on the rdf:Description element is a required >> attribute that identifies the resource whose metadata this XMP describes. >> The value of this attribute must follow URI syntax and may be either: >> >> ● an empty string (as in the example above), which means that the XMP is >> physically local to the resource being described. Applications must rely on >> knowledge of the file format to correctly associate the XMP with the >> resource. >> >> ● a unique instance ID that is generated every time a file is saved. The >> next section gives guidelines for creating instance IDs. >> >> The XMP packet must describe a single entity, and my reading of the above >> is a combination of empty-string and a unique UUID can meet this >> requirement - this is how both our software and Acrobat X and XI behave. >> However it's ambiguous, and this clause was revised in the 2012 revision >> (ISO 16684-1:2011(E) para 7.4) to this: >> >> If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI >> shall be the value of an rdf:about attribute in each top-level >> rdf:Description element. Otherwise, the rdf:about attributes for all top- >> level rdf:Description elements shall be present with an empty value. The >> rdf:about attribute shall not be used in more deeply nested rdf:Description >> elements. >> For compatibility with very early XMP usage, it is recommended that XMP >> readers tolerate a missing rdf:about attribute and treat it as present with >> an empty value. It is also recommended that XMP readers tolerate a mix of >> empty and non-empty rdf:about values, as long as all non-empty values are >> identical. >> >> Which means that an empty string and a unique UUID are technically >> incorrect, but it's recommended they be tolerated for compatibility >> purposes. >> >> I concede this is a very fine hair to split, but if you're writing >> software to validate or create PDF/A you have to make a decision on way or >> another. BFO and Acrobat X and XI think this is valid, PDFBox and >> pdf-tools.com online validator lean the other and classify this document >> as invalid. The end result is a document which might be PDF/A compliant, >> but no-one is really sure (and if anyone can give me a definitive answer >> please do - email me off-list if you want a copy of the document, I will >> need to get permission from my customer to forward it). >> </snip> >> >> I can also share a sample file if one is interested working on that. >> >> BR >> >> >> Maruan Sahyoun >> >>
