[ 
https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iain Lopata updated NUTCH-1991:
-------------------------------
    Affects Version/s: 2.3.1
                       1.11
                       1.10
                       2.4
                       2.2
                       2.3
                       1.8
                       1.9
                       2.2.1

> Tika mime detection not using Nutch supplied tika-mimetypes.xml for content 
> based detection
> -------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1991
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1991
>             Project: Nutch
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1, 1.10, 1.11, 2.3.1
>            Reporter: Iain Lopata
>            Priority: Minor
>         Attachments: NUTCH-1991-1.6.patch
>
>
> From Nutch Version 1.5 onwards the MimeUtil.java class that acts as a facade 
> to Tika to perform mime type detection uses a process that attempts a match 
> using the mimetype returned by the server, the filename and the content. 
> NUTCH-1045 provided for the use of an external tika-mimetype.xml file which 
> provides the configuration for this process.  However, the content based 
> detection did not use this file, but instead reverted to using the 
> configuration included in the tika library.  Consequently, any content based 
> match rules added to the nutch version of the configuration file were not 
> used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to