[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937684#comment-17937684 ]
Jochen Stärk commented on PDFBOX-5976: -------------------------------------- Hello in conjunction with preflight like this {{<dependency>}} {{<groupId>org.apache.pdfbox</groupId>}} {{<artifactId>preflight</artifactId>}} {{<version>3.0.5</version>}} {{<scope>system</scope>}} {{<systemPath>${project.basedir}/preflight-app-3.0.5-20250322.104829-67.jar</systemPath>}} {{</dependency>}} {{<dependency>}} {{<groupId>org.apache.pdfbox</groupId>}} {{<artifactId>pdfbox</artifactId>}} {{<version>3.0.5</version>}} {{<scope>system</scope>}} {{<systemPath>${project.basedir}/pdfbox-app-3.0.5-20250322.104829-66.jar</systemPath>}} {{</dependency>}} I can confirm it works, thanks a lot! I assume -app are with dependencies, what I previously only knew as "shaded"? FYI I really only need this functionality in the very outskirts of mustangproject.org, until your 3.0.5 release I consider the following xpath based workaround: {{InputStream exportXMPMetadata = metadata.exportXMPMetadata();}} {{DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();}} {{builderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);}} {{builderFactory.setNamespaceAware(true);}} {{DocumentBuilder builder = builderFactory.newDocumentBuilder();}} {{Document xmlDocument = builder.parse(exportXMPMetadata);}} {{XPath xPath = XPathFactory.newInstance().newXPath();}} {{xPath.setNamespaceContext(new NamespaceContext() {}} {{@Override}} {{public String getNamespaceURI(String prefix) {}} {{if ("pdfaid".equals(prefix)) {}} {{return "http://www.aiim.org/pdfa/ns/id/";}} {{}}} {{return null;}} {{}}} {{@Override}} {{public String getPrefix(String namespaceURI) {}} {{// This should be implemented but I'm lazy and this sample works without it}} {{return null;}} {{}}} {{@Override}} {{public Iterator getPrefixes(String namespaceURI) {}} {{// This should be implemented but I'm lazy and this sample works without it}} {{return null;}} {{}}} {{});}} {{String expression = "//@pdfaid:part";}} {{NodeList childNodes = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);}} {{for (int n = childNodes.getLength() - 1; n >= 0; n--) {}} {{Node child = childNodes.item(n);}} {{short nodeType = child.getNodeType();}} {{if (nodeType == Node.ELEMENT_NODE) {}} {{// clean(child);}} {{}}} {{else if (nodeType == Node.ATTRIBUTE_NODE) {}} {{String trimmedNodeVal = child.getNodeValue().trim();}} {{System.out.println("PDF/A version"+trimmedNodeVal);}} {{//child.setNodeValue(trimmedNodeVal);}} {{} else if (nodeType == Node.COMMENT_NODE) {}} {{// node.removeChild(child);}} {{}}} {{}}} > DomXmpParser incorrectly expects namespaces on attribute level > -------------------------------------------------------------- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox > Affects Versions: 2.0.33, 3.0.4 PDFBox > Reporter: Jochen Stärk > Assignee: Tilman Hausherr > Priority: Major > Labels: xml > Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0 > > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{<rdf:RDF xmlns:pdf="http://ns.adobe.com/pdf/1.3/" > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description > rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /><rdf:Description > rdf:about="" pdf:Producer="WeasyPrint 64.1" /></rdf:RDF>}} > to > {{ <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">}} > {{ <rdf:Description rdf:about=""}} > {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ </rdf:RDF>}} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > PDFBOX-2913. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org