[ 
https://issues.apache.org/jira/browse/TIKA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926796#action_12926796
 ] 

Jukka Zitting commented on TIKA-531:
------------------------------------

How is the output invalid XML? The name attribute in <meta name="xmpTPg:NPages" 
content="..."/> is defined as a plain CDATA attribute by XHTML, so a parser 
shouldn't try to parse it's contents as an XML name.

Note that down the line we may want to switch to something like RDFa for 
serializing metadata attributes, but for now the metadata names should be 
treated just as plain strings even though the xmp ones look like XML names with 
their prefixes.

> xmpTPg:NPages creates invalid XML
> ---------------------------------
>
>                 Key: TIKA-531
>                 URL: https://issues.apache.org/jira/browse/TIKA-531
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 0.8
>            Reporter: Sjoerd Smeets
>             Fix For: 0.8
>
>
> Hi,
> Parsing MS Office files or PDF documents results invalid XML as there is a 
> missing name-space definition for xmpTPg:NPages. What would be the best 
> approach, renaming this field or add the name-space definition to the header 
> of the output xml?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to