[ 
https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iain Lopata updated NUTCH-1991:
-------------------------------
    Description: From Nutch Version 1.5 onwards the MimeUtil.java class that 
acts as a facade to Tika to perform mime type detection uses a process that 
attempts a match using the mimetype returned by the server, the filename and 
the content. NUTCH-1045 provided for the use of an external tika-mimetype.xml 
file which provides the configuration for this process.  However, the content 
based detection did not use this file, but instead reverted to using the 
configuration included in the tika library.  Consequently, any content based 
match rules added to the nutch version of the configuration file were not used.

> Tika mime detection not using Nutch supplied tika-mimetypes.xml for content 
> based detection
> -------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1991
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1991
>             Project: Nutch
>          Issue Type: Bug
>          Components: util
>            Reporter: Iain Lopata
>            Priority: Minor
>
> From Nutch Version 1.5 onwards the MimeUtil.java class that acts as a facade 
> to Tika to perform mime type detection uses a process that attempts a match 
> using the mimetype returned by the server, the filename and the content. 
> NUTCH-1045 provided for the use of an external tika-mimetype.xml file which 
> provides the configuration for this process.  However, the content based 
> detection did not use this file, but instead reverted to using the 
> configuration included in the tika library.  Consequently, any content based 
> match rules added to the nutch version of the configuration file were not 
> used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to