[ https://issues.apache.org/jira/browse/PDFBOX-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937447#comment-17937447 ]
Tilman Hausherr commented on PDFBOX-5977: ----------------------------------------- One thing to do will be to add {{builderFactory.setNamespaceAware(true);}} to {{XMLUtil.parse()}} because without it, the namespace information is lost. And then some changes in {{MPMetadata.getSchemas()}} at line 645: {code:java} else if (attribute.getNamespaceURI() != null && nsMappings.containsKey(attribute.getNamespaceURI()) && name.contains(":")) { Class<?> schemaClass = nsMappings.get(attribute.getNamespaceURI()); try { String prefix = name.substring(0, name.indexOf(':')); Constructor<?> ctor = schemaClass .getDeclaredConstructor(new Class[] { Element.class, String.class }); retval.add((XMPSchema)ctor.newInstance(new Object[] { schema, prefix })); found = true; } catch(NoSuchMethodException e) { throw new IOException( "Error: Class " + schemaClass.getName() + " must have a constructor with the signature of " + schemaClass.getName() + "( org.w3c.dom.Element, java.lang.String )"); } catch(Exception e) { e.printStackTrace(); throw new IOException(e.getMessage()); } } {code} This would have to be refactored because the previous block is very similar, but I didn't do it for now to show what the change is about. There's also an bug in the existing code that the "schema" object is created several times instead of only once. > PDFA schema not detected > ------------------------ > > Key: PDFBOX-5977 > URL: https://issues.apache.org/jira/browse/PDFBOX-5977 > Project: PDFBox > Issue Type: Bug > Components: JempBox > Affects Versions: 1.8.17 > Reporter: Tilman Hausherr > Priority: Major > > {code:java} > String s = "<?xml version=\"1.0\" encoding=\"UTF-8\" > standalone=\"no\"?>\n" + > "<?xpacket begin=\"\" id=\"W5M0MpCehiHzreSzNTczkc9d\"?><rdf:RDF > xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" > xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\" > xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\">\n" + > " <rdf:Description pdfaid:conformance=\"B\" pdfaid:part=\"3\" > rdf:about=\"\"/>\n" + > " <rdf:Description pdf:Producer=\"WeasyPrint 64.1\" rdf:about=\"\"/>\n" + > "</rdf:RDF><?xpacket end=\"r\"?>"; > XMPMetadata xmp = XMPMetadata.load(new > ByteArrayInputStream(s.getBytes())); > xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class); > XMPSchemaPDFAId schema = (XMPSchemaPDFAId) > xmp.getSchemaByClass(XMPSchemaPDFAId.class); > System.out.println(schema.getConformance() + " " + schema.getPart()); > {code} > This fails with an NPE because > {{xmp.getSchemaByClass(XMPSchemaPDFAId.class)}} is null. > While most PDFBox users use xmpbox, some may still use jempbox due to bugs, > especially our sister project Apache Tika. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org