[ 
https://issues.apache.org/jira/browse/PDFBOX-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434456#comment-15434456
 ] 

Maruan Sahyoun commented on PDFBOX-3471:
----------------------------------------

I did it locally but wanted to run it past you to get your feedback. As for the 
TODO - I included it as I haven't fully thought about that and wanted to make 
sure that this is captured. IMHO there is a difference between an empty text 
node and a text node which contains only whitespace. Moving forward I'd prefer 
not to change the XMP while parsing to ensure that if you serialize you get the 
same content. Having said that XMPBox is not a general XMP handling library but 
in it's current state targeted to validating XMP as part of PDF/A - so the 
changes are (currently) OK.

> XMP parsing fails if XMP contain comments
> -----------------------------------------
>
>                 Key: PDFBOX-3471
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3471
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>    Affects Versions: 2.0.2
>            Reporter: Petras
>         Attachments: PDFBOX-3471_XmpParsingIgnoringComments.patch
>
>
> DomXmpParser parser fails with such correct XMP:
> {code:xml}
> <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
>     <!-- PDF/A standarto versija (1 ar 2) ir suderinamumo lygmuo (A, B ar U) 
> -->
>     <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
>         <rdf:Description rdf:about = ""
>                          xmlns:pdfaid = "http://www.aiim.org/pdfa/ns/id/";>
>             <pdfaid:part>1</pdfaid:part>
>             <pdfaid:conformance>B</pdfaid:conformance>
>         </rdf:Description>
>     </rdf:RDF>
> </x:xmpmeta>
> <?xpacket end="w"?>
> {code}
> DomXmpParser finds comment node and fails:
> {code}
> org.apache.xmpbox.xml.XmpParsingException: More than one element found in 
> x:xmpmeta
>       at 
> org.apache.xmpbox.xml.DomXmpParser.findDescriptionsParent(DomXmpParser.java:750)
>       at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:183)
>       at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:111)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to