[ https://issues.apache.org/jira/browse/PDFBOX-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937694#comment-17937694 ]
Tilman Hausherr edited comment on PDFBOX-2913 at 3/23/25 2:15 PM: ------------------------------------------------------------------ Wow this issue is now almost 10 years old. I've tried a few things over the years but was never successful, but I should write down my thoughts / observations. Unlike other xmpbox changes I made over the years this won't be a few lines. This rds thing is partly supported but not as a schema. "<rdf:value>" isn't supported at all. [^xmp673189-ok.xml] is another file with "<rdf:value>" that doesn't fail. But it doesn't work properly either, when debugging why this worked, I looked at this part {code:xml} <desc:FileName rdf:parseType="Resource"> <rdf:value>E:\Pam_Ward\INS Forms-EB-2004\WIP XFT\I-102_v5.xft</rdf:value> <desc:ref>/template/subform[1]</desc:ref> </desc:FileName> {code} and it returns "E:\Pam_Ward\INS Forms-EB-2004\WIP XFT\I-102_v5.xft/template/subform[1]". This happens because this line is called {code:java} manageSimpleType(xmp, property, Types.Text, container); {code} If I delete the "<rdf:value>A</rdf:value>" it will still fail, because xmpidq:Scheme isn't implemented. It's mentioned here: https://pdfa.org/wp-content/uploads/2011/08/tn0008_predefined_xmp_properties_in_pdfa-1_2008-03-20.pdf I tried it to add it as a schema but this doesn't work, it has to be a AbstractSimpleProperty. I have a look at all the 250000 files if the digitalcorpora corpus, none of them has xmpidq. We could try to implement it, but I'm not sure how, this isn't a full schema, it's a single property. Implementing it as a property (similar to the GPS property) made it fail elsewhere. Implementing a minimal schema file also didn't help. was (Author: tilman): Wow this issue is now almost 10 years old. I've tried a few things over the years but was never successful, but I should write down my thoughts / observations. Unlike other xmpbox changes I made over the years this won't be a few lines. This rds thing is partly supported but not as a schema. "<rdf:value>" isn't supported at all. [^xmp673189-ok.xml] is another file with "<rdf:value>" that doesn't fail. But it doesn't work properly either, when debugging why this worked, I looked at this part {code:xml} <desc:FileName rdf:parseType="Resource"> <rdf:value>E:\Pam_Ward\INS Forms-EB-2004\WIP XFT\I-102_v5.xft</rdf:value> <desc:ref>/template/subform[1]</desc:ref> </desc:FileName> {code} and it returns "E:\Pam_Ward\INS Forms-EB-2004\WIP XFT\I-102_v5.xft/template/subform[1]". This happens because this line is called {code:java} manageSimpleType(xmp, property, Types.Text, container); {code} If I delete the "<rdf:value>A</rdf:value>" it will still fail, because xmpidq:Scheme isn't implemented. It's mentioned here: https://pdfa.org/wp-content/uploads/2011/08/tn0008_predefined_xmp_properties_in_pdfa-1_2008-03-20.pdf I tried it to add it as a schema but this doesn't work, it has to be a AbstractSimpleProperty. I have a look at all the 250000 files if the digitalcorpora corpus, none of them has xmpidq. We could try to implement it, but I'm not sure how, this isn't a full schema, it's a single property. > DomXmpParser fails on property containing qualifier > --------------------------------------------------- > > Key: PDFBOX-2913 > URL: https://issues.apache.org/jira/browse/PDFBOX-2913 > Project: PDFBox > Issue Type: Bug > Components: XmpBox > Affects Versions: 1.8.10 > Reporter: Petras > Priority: Major > Attachments: qualified_li.xmp, screenshot-1.png, xmp673189-ok.xml > > > According to XMP specification properties may have qualifiers. In our > scenario we used {{xmp:Identifier}} element from XMP Basic Schema holding an > array of text strings. An array item may be qualified with {{xmpidq:Scheme}}: > {code:xml} > <rdf:Description rdf:about="" > xmlns:xmp="http://ns.adobe.com/xap/1.0/" > xmlns:xmpidq="http://ns.adobe.com/xmp/Identifier/qual/1.0/"> > <xmp:Identifier> > <rdf:Bag> > <rdf:li rdf:parseType="Resource"> > <rdf:value>A</rdf:value> > <xmpidq:Scheme>http://archyvai.lt/pdf-ltud/2013/level/</xmpidq:Scheme> > </rdf:li> > </rdf:Bag> > </xmp:Identifier> > </rdf:Description> > {code} > {{DomXmpParser}} fails when parsing XMP containing such qualifiers: > {code} > org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this document > : http://www.w3.org/1999/02/22-rdf-syntax-ns# > at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:787) > at > org.apache.xmpbox.xml.DomXmpParser.parseLiDescription(DomXmpParser.java:508) > at > org.apache.xmpbox.xml.DomXmpParser.parseLiElement(DomXmpParser.java:449) > at org.apache.xmpbox.xml.DomXmpParser.manageArray(DomXmpParser.java:407) > at > org.apache.xmpbox.xml.DomXmpParser.createProperty(DomXmpParser.java:309) > at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:267) > at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:199) > at > org.apache.xmpbox.TestXMPWithDefinedSchemas.main(TestXMPWithDefinedSchemas.java:66) > ... > {code} > It appears it failed on {{rdf:value}} element as > {{org.apache.xmpbox.type.TypeMapping}} class is not aware about > {{http://www.w3.org/1999/02/22-rdf-syntax-ns#}} standard namespace. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org