Jochen Stärk created PDFBOX-5976:
------------------------------------

             Summary: DomXmpParser incorrectly expects namespaces on attribute 
level
                 Key: PDFBOX-5976
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5976
             Project: PDFBox
          Issue Type: Bug
          Components: XmpBox
    Affects Versions: 3.0.4 PDFBox
            Reporter: Jochen Stärk
         Attachments: AN-10005_v28_2025-03-19-2.pdf, 
AN-10005_v28_2025-03-19x-1.pdf

When trying to determine the PDF-A-Version like 

{{PDDocument document = null;}}
{{try {}}
{{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
{{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
{{PDMetadata metadata = catalog.getMetadata();}}
{{DomXmpParser xmpParser = new DomXmpParser();}}
{{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
{{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
{{if (pdfaSchema != null) {}}
{{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
{{}}}
{{document.close();}}
{{} catch (XmpParsingException e) {}}
{{e.printStackTrace();}}
{{} catch (IOException e) {}}
{{e.printStackTrace();}}
{{}}}

on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox

incorrectly fails with a 

 

{{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this document 
: http://www.aiim.org/pdfa/ns/id/}}
{{    at 
org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
{{    at 
org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
{{    at 
org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
{{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
{{    at de.usegroup.Main.main(Main.java:25)}}

 

After manipulating the metadata stream with itext RuPS from 


{{<rdf:RDF xmlns:pdf="http://ns.adobe.com/pdf/1.3/"; 
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";><rdf:Description 
rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /><rdf:Description 
rdf:about="" pdf:Producer="WeasyPrint 64.1" /></rdf:RDF>}}

to

{{  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
{{    <rdf:Description rdf:about=""}}
{{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
{{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
{{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
{{      pdfaid:conformance="B"}}
{{      pdfaid:part="3"}}
{{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 (AGPL 
version) ©2000-2023 iText Group NV"}}
{{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
{{  </rdf:RDF>}}

putting the namespace definition in the rdf:Description 
(AN-10005_v28_2025-03-19x-1.pdf) it works. 

The issue is: it should be sufficient to put the namespace definitions in the 
root element, "RDF", i.e. the first example should also work.

 

When searching for similar issues I had the impression this may be similar to 
your issue #2219



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to