[
https://issues.apache.org/jira/browse/TIKA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372020#comment-14372020
]
Tyler Palsulich commented on TIKA-1293:
---------------------------------------
Looks good to me. Any objections to adding this magic for HTML Netscape
bookmark files?
> Netscape bookmark files are not being detected as HTML
> ------------------------------------------------------
>
> Key: TIKA-1293
> URL: https://issues.apache.org/jira/browse/TIKA-1293
> Project: Tika
> Issue Type: Bug
> Components: detector, mime
> Reporter: Phil Lester
> Attachments: bookmarks.txt
>
>
> We are able to circumvent the HTML file type detection using the standard
> Netscape bookmark file doctype (<!DOCTYPE NETSCAPE-Bookmark-file-1>) and
> renaming the file extension to .txt. Standard HTML elements can then be
> included in the file. Some browsers (such as Firefox) will detect the .txt
> file as HTML and display it accordingly when downloading.
> We were able to resolve this by adding a custom mime-type for text/html that
> included a match pattern for the Netscape doctype:
> <match value="<!DOCTYPE NETSCAPE-Bookmark-file-1" type="string"
> offset="0:64"/>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)