[ 
https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224714#comment-14224714
 ] 

Andrew Jackson commented on TIKA-1486:
--------------------------------------

There's no problem with adding an XML namespace in principle - I'm not using a 
MIME-info specific parser or anything. It's just that because the namespace is 
not declared, the document is not 
[namespace-well-formed|http://stackoverflow.com/questions/14871752/is-xml-document-with-undeclared-prefix-well-formed],
 and this upsets some parsers. It's not critical - it just makes it harder to 
parse the document with an off-the-shelve XML parser configuration.

On the globs, is there a functional difference between the "^rdf$" and "rdf" 
globs? If not, I'll just configure my analyser to strip out the ^ and $.

> Minor issues with the Tika MIME type magic file
> -----------------------------------------------
>
>                 Key: TIKA-1486
>                 URL: https://issues.apache.org/jira/browse/TIKA-1486
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector
>    Affects Versions: 1.6
>            Reporter: Andrew Jackson
>            Priority: Minor
>
> I've started running some routine tests on format information held in a 
> number of tools, including 
> [Tika|http://www.digipres.org/formats/sources/tika/issues/]. This uncovered a 
> number of minor issues when working with the tika-mimetypes.xml file:
> * Duplicate MIME type application/gzip-compressed for type application/gzip.
> * Duplicate MIME type image/vnd.dwg for type image/vnd.dwg.
> * Error when parsing XML: Namespace prefix tika on link is not defined, line 
> 169, column 15
> * Format application/dita+xml;format=task has itself as a supertype!
> * Glob '^owl$' for entry application/rdf+xml does not appear to be a valid 
> filename specification.
> * Glob '^rdf$' for entry application/rdf+xml does not appear to be a valid 
> filename specification.
> With the last two, it's really a matter of consistency. The other 
> full-filename globs do *not* use the ^ and $ start and end markers, but owl 
> and rdf do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to