[ 
https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196843#comment-13196843
 ] 

Markus Jelsma commented on NUTCH-1262:
--------------------------------------

I've looked through the API and sourcecode but it doesn't seem to be there as 
we need it. It does provide API's to return aliases but the example types are 
not considered the aliasses in Tika judging from the tike-mimetypes.xml 
resource file.

This simple mapper also allows users to map content types to a human friendly 
alias.
                
> Map `duplicating` content-types to a single type
> ------------------------------------------------
>
>                 Key: NUTCH-1262
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1262
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1262-1.5-1.patch
>
>
> Similar or duplicating content-types can end-up differently in an index. 
> With, for example, both application/xhtml+xml and text/html it is impossible 
> to use a single filter to select `web pages`.
> See also: 
> http://lucene.472066.n3.nabble.com/application-xhtml-xml-gt-text-html-td3699942.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to