[
https://issues.apache.org/jira/browse/NUTCH-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067861#comment-13067861
]
Markus Jelsma edited comment on NUTCH-1045 at 7/19/11 5:50 PM:
---------------------------------------------------------------
Here it is, in the fetcher's mapper job:
INFO org.apache.nutch.util.MimeUtil: Detected MIME-type: application/xhtml+xml
there's also a lot of stuff in the log:
{code}
2011-07-19 17:43:20,837 ERROR org.apache.nutch.util.MimeUtil: Can't load
mime.types.file : tika-mimetypes.xml using Tika's default
2011-07-19 17:43:20,845 INFO org.apache.hadoop.conf.Configuration:
tika-mimetypes.xml not found
2011-07-19 17:43:20,846 ERROR org.apache.nutch.util.MimeUtil: Can't load
mime.types.file : tika-mimetypes.xml using Tika's default
2011-07-19 17:43:20,856 INFO org.apache.hadoop.conf.Configuration:
tika-mimetypes.xml not found
2011-07-19 17:43:20,857 ERROR org.apache.nutch.util.MimeUtil: Can't load
mime.types.file : tika-mimetypes.xml using Tika's default
2011-07-19 17:43:20,858 INFO org.apache.hadoop.conf.Configuration:
tika-mimetypes.xml not found
2011-07-19 17:43:20,859 ERROR org.apache.nutch.util.MimeUtil: Can't load
mime.types.file : tika-mimetypes.xml using Tika's default
2011-07-19 17:43:20,860 INFO org.apache.hadoop.conf.Configuration:
tika-mimetypes.xml not found
2011-07-19 17:43:20,861 ERROR org.apache.nutch.util.MimeUtil: Can't load
mime.types.file : tika-mimetypes.xml using Tika's default
2011-07-19 17:43:20,863 INFO org.apache.hadoop.conf.Configuration:
tika-mimetypes.xml not found
{code}
was (Author: markus17):
Here it is, in the fetcher's mapper job:
INFO org.apache.nutch.util.MimeUtil: Detected MIME-type: application/xhtml+xml
> MimeUtil to rely on default config provided by Tika
> ---------------------------------------------------
>
> Key: NUTCH-1045
> URL: https://issues.apache.org/jira/browse/NUTCH-1045
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.4, 2.0
> Reporter: Julien Nioche
> Priority: Minor
> Fix For: 1.4, 2.0
>
> Attachments: NUTCH-1045-1.4-v2.patch, NUTCH-1045-1.4.patch
>
>
> We currently provide conf/tika-mimetypes.xml despite the fact that it is
> absolutely similar to the one found in tika-core.jar
> Having a mechanism for specifying a custom tika-mimetypes.xml is good though
> but if the user hasn't specified one or if it can't be loaded then we should
> rely on Tika's default. This way we won't need to provide
> conf/tika-mimetypes.xml anymore and keep it in sync with the default Tika one
> whenever we upgrade Tika.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira