[
https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224745#comment-14224745
]
Andrew Jackson commented on TIKA-1486:
--------------------------------------
A-ha! I didn't notice the {{isregex="true"}} attribute - thank you! I'll modify
my parser accordingly.
FWIW, you don't need to make a schema to use a namespace, and it does not need
to resolve to anything. But as I say, it's not crucial - I suppose all XML
parsers can be configured to ignore the issue.
Thanks again.
> Minor issues with the Tika MIME type magic file
> -----------------------------------------------
>
> Key: TIKA-1486
> URL: https://issues.apache.org/jira/browse/TIKA-1486
> Project: Tika
> Issue Type: Improvement
> Components: detector
> Affects Versions: 1.6
> Reporter: Andrew Jackson
> Priority: Minor
>
> I've started running some routine tests on format information held in a
> number of tools, including
> [Tika|http://www.digipres.org/formats/sources/tika/issues/]. This uncovered a
> number of minor issues when working with the tika-mimetypes.xml file:
> * Duplicate MIME type application/gzip-compressed for type application/gzip.
> * Duplicate MIME type image/vnd.dwg for type image/vnd.dwg.
> * Error when parsing XML: Namespace prefix tika on link is not defined, line
> 169, column 15
> * Format application/dita+xml;format=task has itself as a supertype!
> * Glob '^owl$' for entry application/rdf+xml does not appear to be a valid
> filename specification.
> * Glob '^rdf$' for entry application/rdf+xml does not appear to be a valid
> filename specification.
> With the last two, it's really a matter of consistency. The other
> full-filename globs do *not* use the ^ and $ start and end markers, but owl
> and rdf do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)