Re: PDF/A & xmp

Maruan Sahyoun Fri, 02 Aug 2013 11:11:20 -0700

Hi Guillaume,

let's wait for a settlement on that topic @ pdfa.org. I'll open a ticket with a 
description as soon as there are comments on it.


BR

Maruan Sahyoun

Am 02.08.2013 um 20:04 schrieb Guillaume Bailleul <[email protected]>:

> Hi Maruan,
> 
> When we developed preflight and xmpbox we did not take any decision on that
> point. We could have considered it valid ... it is a matter of luck.
> 
> Changing that behavior will not be difficult.
> 
> IMO, we should wait that someone is really sure before changing something.
> 
> Can you create an issue to not forget this point ?
> 
> KR,
> 
> Guillaume
> 
> 
> 
> 
> On Fri, Aug 2, 2013 at 2:52 PM, Maruan Sahyoun <[email protected]>wrote:
> 
>> being a member of the PDFAssociation (pdfa.org) there was a discussion
>> about some edge cases in xml that we interpret differently when doing PDF/A
>> validation than Acrobat and bro which I'm allowed to share
>> 
>> <snip>
>> In this case we have a PDF with an XMP metadata stream containing two
>> <rdf:RDF> entries, one with rdf:about set to a blank string, the other with
>> it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2)
>> simply says that the stream must conform to the "XMP specification 2004
>> revision" which reads (p21):
>> 
>> The rdf:about attribute on the rdf:Description element is a required
>> attribute that identifies the resource whose metadata this XMP describes.
>> The value of this attribute must follow URI syntax and may be either:
>> 
>> ●  an empty string (as in the example above), which means that the XMP is
>> physically local to the resource being described. Applications must rely on
>> knowledge of the file format to correctly associate the XMP with the
>> resource.
>> 
>> ●  a unique instance ID that is generated every time a file is saved. The
>> next section gives guidelines for creating instance IDs.
>> 
>> The XMP packet must describe a single entity, and my reading of the above
>> is a combination of empty-string and a unique UUID can meet this
>> requirement - this is how both our software and Acrobat X and XI behave.
>> However it's ambiguous, and this clause was revised in the 2012 revision
>> (ISO 16684-1:2011(E) para 7.4) to this:
>> 
>> If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI
>> shall be the value of an rdf:about attribute in each top-level
>> rdf:Description element. Otherwise, the rdf:about attributes for all top-
>> level rdf:Description elements shall be present with an empty value. The
>> rdf:about attribute shall not be used in more deeply nested rdf:Description
>> elements.
>> For compatibility with very early XMP usage, it is recommended that XMP
>> readers tolerate a missing rdf:about attribute and treat it as present with
>> an empty value. It is also recommended that XMP readers tolerate a mix of
>> empty and non-empty rdf:about values, as long as all non-empty values are
>> identical.
>> 
>> Which means that an empty string and a unique UUID are technically
>> incorrect, but it's recommended they be tolerated for compatibility
>> purposes.
>> 
>> I concede this is a very fine hair to split, but if you're writing
>> software to validate or create PDF/A you have to make a decision on way or
>> another. BFO and Acrobat X and XI think this is valid, PDFBox and
>> pdf-tools.com online validator lean the other and classify this document
>> as invalid. The end result is a document which might be PDF/A compliant,
>> but no-one is really sure (and if anyone can give me a definitive answer
>> please do - email me off-list if you want a copy of the document, I will
>> need to get permission from my customer to forward it).
>> </snip>
>> 
>> I can also share a sample file if one is interested working on that.
>> 
>> BR
>> 
>> 
>> Maruan Sahyoun
>> 
>>

Re: PDF/A & xmp

Reply via email to