Hi,

can you share the full FO? I tried to replicate your issue but my file is 
compatible with PDFA 3b. I used https://avepdf.com/en/pdfa-validation to verify 
the pdf.

Regards

From: Jörn Willhöft <j...@willhoeft-it.com>
Sent: Tuesday, November 5, 2024 5:26 AM
To: fop-users@xmlgraphics.apache.org
Subject: Namespaces are dropped in XMP stream. XML invalid

You don't often get email from 
j...@willhoeft-it.com<mailto:j...@willhoeft-it.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Dear list members,

I need to create PDF/A-3B compliant documents with special references to an 
embedded document (ZUGFeRD compliant invoices). Currently, this fails in 
VeraPDF with some very vague and rather misleading validation errors:
* "Specification: ISO 19005-3:2012, Clause: 6.6.2.1, Test number: 5
All metadata streams present in the PDF shall conform to the XMP Specification. 
The XMP package must be encoded as UTF-8"
* "Specification: ISO 19005-3:2012, Clause: 6.6.4, Test number: 1
The PDF/A version and conformance level of a file shall be specified using the 
PDF/A Identification extension schema"
* "Specification: ISO 19005-3:2012, Clause: 6.6.2.1, Test number: 4
All metadata streams present in the PDF shall conform to the XMP Specification. 
All content of all XMP packets shall be well-formed, as defined by Extensible 
Markup Language (XML) 1.0 (Third Edition), 2.1, and the RDF/XML Syntax 
Specification (Revised)"

I finally tracked this down to FOP, which dropped some necessary namespaces in 
the created XMP stream. This can be reproduced with the corresponding example 
from the FOP homepage (https://xmlgraphics.apache.org/fop/2.10/metadata.html). 
E.g. here is the fo:

<fo:simple-page-master master-name="simple">

  <fo:region-body/>

  <pdf:page page-numbers="*">

    <x:xmpmeta xmlns:x="adobe:ns:meta/">

      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; 
xmlns:abc="http://www.abc.de/abc/";>

        <rdf:Description rdf:about="" abc:def="val"/>

        <rdf:Description rdf:about="" 
xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/";

                         xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#";

                         
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#";>

          <pdfaExtension:schemas>

            <rdf:Bag>

              <rdf:li rdf:parseType="Resource">

                <pdfaSchema:property>

                  <rdf:Seq>

                    <rdf:li rdf:parseType="Resource">

                      <pdfaProperty:name>split</pdfaProperty:name>

                    </rdf:li>

                  </rdf:Seq>

                </pdfaSchema:property>

              </rdf:li>

            </rdf:Bag>

          </pdfaExtension:schemas>

        </rdf:Description>

      </rdf:RDF>

    </x:xmpmeta>

  </pdf:page>

</fo:simple-page-master>
And this is returned when I run pdfinfo -meta example.pdf:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta 
xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
     <rdf:RDF xmlns:abc="http://www.abc.de/abc/"; abc:def="val" rdf:about=""/>
     <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"; rdf:about="">
        <dc:format>application/pdf</dc:format>
        <dc:language>
           <rdf:Bag>
              <rdf:li>x-unknown</rdf:li>
           </rdf:Bag>
        </dc:language>
        <dc:date>
           <rdf:Seq>
              <rdf:li>2024-11-05T11:46:27+01:00</rdf:li>
           </rdf:Seq>
        </dc:date>
     </rdf:RDF>
     <rdf:RDF xmlns:pdf="http://ns.adobe.com/pdf/1.3/"; rdf:about="">
        <pdf:Producer>Apache FOP Version 2.10</pdf:Producer>
        <pdf:PDFVersion>1.4</pdf:PDFVersion>
     </rdf:RDF>
     <rdf:RDF xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"; 
rdf:about="">
        <pdfaExtension:schemas>
           <rdf:Bag>
              <rdf:li rdf:parseType="Resource">
                 <pdfaSchema:property>
                    <rdf:Seq>
                       <rdf:li rdf:parseType="Resource">
                          <pdfaProperty:name>split</pdfaProperty:name>
                       </rdf:li>
                    </rdf:Seq>
                 </pdfaSchema:property>
              </rdf:li>
           </rdf:Bag>
        </pdfaExtension:schemas>
     </rdf:RDF>
     <rdf:RDF xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; rdf:about="">
        <pdfaid:conformance>B</pdfaid:conformance>
        <pdfaid:part>3</pdfaid:part>
     </rdf:RDF>
     <rdf:RDF xmlns:xmp="http://ns.adobe.com/xap/1.0/"; rdf:about="">
        <xmp:MetadataDate>2024-11-05T11:46:27+01:00</xmp:MetadataDate>
        <xmp:CreateDate>2024-11-05T11:46:27+01:00</xmp:CreateDate>
     </rdf:RDF>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="r"?>
As you can see, the two namespaces 
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#";   and 
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#";
are dropped from the metadata, while the prefixes are still used. Thus, the XML 
is invalid and so is the PDF/A.
This is pretty unfortunate, as I don't have any workaround for this. I am using 
Apache FOP 2.10 in the context of Apache Camel 4.8.1. Any help would be greatly 
appreciated.


With kind regards,

   Jörn Willhöft

________________________________
Joao Andre Goncalves
Customer Developer

t | m 07733161880
jgoncal...@smartcommunications.com
smartcommunications.com<https://www.smartcommunications.com/>

[https://scale.smartcommunications.com/rs/041-BQO-927/images/024_Esignature-2024-Benchmark.png?version=0]<https://www.smartcommunications.com/resources/benchmark-report/?utm_source=outlook&utm_medium=email&utm_campaign=benchmark_report_2024_webpage>

The largest study of its kind! Unlock unparalleled insights into customer 
preferences and stay ahead in today's market. Download 
now!<https://www.smartcommunications.com/resources/benchmark-report/?utm_source=outlook&utm_medium=email&utm_campaign=benchmark_report_2024_webpage>
   Smart Communications is a trading name of SmartComms SC Limited which is 
registered in England under No. 4303041 whose registered office is at Suite 23, 
LCLB, 95 Mortimer Street, London, W1W 7GB. Please consider the environment 
before printing. The contents of this e-mail are intended for the named 
addressee only. It contains confidential information. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Smart Communications will process your data as described in 
the Smart Communications' External Privacy 
Policy.<https://www.smartcommunications.com/external-privacy-policy/>

Follow us on LinkedIn<https://www.linkedin.com/company/15166060/admin/> and 
Twitter<https://twitter.com/ccminnovators?lang=en>

Reply via email to