[
https://issues.apache.org/jira/browse/PDFBOX-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937447#comment-17937447
]
Tilman Hausherr commented on PDFBOX-5977:
-----------------------------------------
One thing to do will be to add {{builderFactory.setNamespaceAware(true);}} to
{{XMLUtil.parse()}} because without it, the namespace information is lost. And
then some changes in {{MPMetadata.getSchemas()}} at line 645:
{code:java}
else if (attribute.getNamespaceURI() != null &&
nsMappings.containsKey(attribute.getNamespaceURI()) &&
name.contains(":"))
{
Class<?> schemaClass = nsMappings.get(attribute.getNamespaceURI());
try
{
String prefix = name.substring(0, name.indexOf(':'));
Constructor<?> ctor = schemaClass
.getDeclaredConstructor(new Class[] { Element.class,
String.class });
retval.add((XMPSchema)ctor.newInstance(new Object[] { schema,
prefix }));
found = true;
}
catch(NoSuchMethodException e)
{
throw new IOException(
"Error: Class "
+ schemaClass.getName()
+ " must have a constructor with the signature of "
+ schemaClass.getName()
+ "( org.w3c.dom.Element, java.lang.String )");
}
catch(Exception e)
{
e.printStackTrace();
throw new IOException(e.getMessage());
}
}
{code}
This would have to be refactored because the previous block is very similar,
but I didn't do it for now to show what the change is about.
There's also an bug in the existing code that the "schema" object is created
several times instead of only once.
> PDFA schema not detected
> ------------------------
>
> Key: PDFBOX-5977
> URL: https://issues.apache.org/jira/browse/PDFBOX-5977
> Project: PDFBox
> Issue Type: Bug
> Components: JempBox
> Affects Versions: 1.8.17
> Reporter: Tilman Hausherr
> Priority: Major
>
> {code:java}
> String s = "<?xml version=\"1.0\" encoding=\"UTF-8\"
> standalone=\"no\"?>\n" +
> "<?xpacket begin=\"\" id=\"W5M0MpCehiHzreSzNTczkc9d\"?><rdf:RDF
> xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"
> xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\"
> xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\">\n" +
> " <rdf:Description pdfaid:conformance=\"B\" pdfaid:part=\"3\"
> rdf:about=\"\"/>\n" +
> " <rdf:Description pdf:Producer=\"WeasyPrint 64.1\" rdf:about=\"\"/>\n" +
> "</rdf:RDF><?xpacket end=\"r\"?>";
> XMPMetadata xmp = XMPMetadata.load(new
> ByteArrayInputStream(s.getBytes()));
> xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class);
> XMPSchemaPDFAId schema = (XMPSchemaPDFAId)
> xmp.getSchemaByClass(XMPSchemaPDFAId.class);
> System.out.println(schema.getConformance() + " " + schema.getPart());
> {code}
> This fails with an NPE because
> {{xmp.getSchemaByClass(XMPSchemaPDFAId.class)}} is null.
> While most PDFBox users use xmpbox, some may still use jempbox due to bugs,
> especially our sister project Apache Tika.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]