[
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937684#comment-17937684
]
Jochen Stärk commented on PDFBOX-5976:
--------------------------------------
Hello
in conjunction with preflight like this
{{<dependency>}}
{{<groupId>org.apache.pdfbox</groupId>}}
{{<artifactId>preflight</artifactId>}}
{{<version>3.0.5</version>}}
{{<scope>system</scope>}}
{{<systemPath>${project.basedir}/preflight-app-3.0.5-20250322.104829-67.jar</systemPath>}}
{{</dependency>}}
{{<dependency>}}
{{<groupId>org.apache.pdfbox</groupId>}}
{{<artifactId>pdfbox</artifactId>}}
{{<version>3.0.5</version>}}
{{<scope>system</scope>}}
{{<systemPath>${project.basedir}/pdfbox-app-3.0.5-20250322.104829-66.jar</systemPath>}}
{{</dependency>}}
I can confirm it works, thanks a lot!
I assume -app are with dependencies, what I previously only knew as "shaded"?
FYI I really only need this functionality in the very outskirts of
mustangproject.org, until your 3.0.5 release I consider the following xpath
based workaround:
{{InputStream exportXMPMetadata = metadata.exportXMPMetadata();}}
{{DocumentBuilderFactory builderFactory =
DocumentBuilderFactory.newInstance();}}
{{builderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);}}
{{builderFactory.setNamespaceAware(true);}}
{{DocumentBuilder builder = builderFactory.newDocumentBuilder();}}
{{Document xmlDocument = builder.parse(exportXMPMetadata);}}
{{XPath xPath = XPathFactory.newInstance().newXPath();}}
{{xPath.setNamespaceContext(new NamespaceContext() {}}
{{@Override}}
{{public String getNamespaceURI(String prefix) {}}
{{if ("pdfaid".equals(prefix)) {}}
{{return "http://www.aiim.org/pdfa/ns/id/";}}
{{}}}
{{return null;}}
{{}}}
{{@Override}}
{{public String getPrefix(String namespaceURI) {}}
{{// This should be implemented but I'm lazy and this sample works without it}}
{{return null;}}
{{}}}
{{@Override}}
{{public Iterator getPrefixes(String namespaceURI) {}}
{{// This should be implemented but I'm lazy and this sample works without it}}
{{return null;}}
{{}}}
{{});}}
{{String expression = "//@pdfaid:part";}}
{{NodeList childNodes = (NodeList)
xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);}}
{{for (int n = childNodes.getLength() - 1; n >= 0; n--) {}}
{{Node child = childNodes.item(n);}}
{{short nodeType = child.getNodeType();}}
{{if (nodeType == Node.ELEMENT_NODE) {}}
{{// clean(child);}}
{{}}}
{{else if (nodeType == Node.ATTRIBUTE_NODE) {}}
{{String trimmedNodeVal = child.getNodeValue().trim();}}
{{System.out.println("PDF/A version"+trimmedNodeVal);}}
{{//child.setNodeValue(trimmedNodeVal);}}
{{} else if (nodeType == Node.COMMENT_NODE) {}}
{{// node.removeChild(child);}}
{{}}}
{{}}}
> DomXmpParser incorrectly expects namespaces on attribute level
> --------------------------------------------------------------
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
> Issue Type: Bug
> Components: XmpBox
> Affects Versions: 2.0.33, 3.0.4 PDFBox
> Reporter: Jochen Stärk
> Assignee: Tilman Hausherr
> Priority: Major
> Labels: xml
> Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
> Attachments: AN-10005_v28_2025-03-19-2.pdf,
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a
>
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{ at
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{ at
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{ at
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{ at de.usegroup.Main.main(Main.java:25)}}
>
> After manipulating the metadata stream with itext RuPS from
> {{<rdf:RDF xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description
> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /><rdf:Description
> rdf:about="" pdf:Producer="WeasyPrint 64.1" /></rdf:RDF>}}
> to
> {{ <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">}}
> {{ <rdf:Description rdf:about=""}}
> {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{ pdfaid:conformance="B"}}
> {{ pdfaid:part="3"}}
> {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{ </rdf:RDF>}}
> putting the namespace definition in the rdf:Description
> (AN-10005_v28_2025-03-19x-1.pdf) it works.
> The issue is: it should be sufficient to put the namespace definitions in the
> root element, "RDF", i.e. the first example should also work.
>
> When searching for similar issues I had the impression this may be similar to
> PDFBOX-2913.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]