[
https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418999#comment-13418999
]
Markus Jelsma commented on NUTCH-1433:
--------------------------------------
{code}
2012-07-20 10:15:49,402 WARN parse.ParserFactory - ParserFactory: Plugin:
org.apache.nutch.parse.html.HtmlParser mapped to contentType
application/xhtml+xml via parse-plugins.xml, but not enabled via
plugin.includes in nutch-default.xml
2012-07-20 10:15:51,065 WARN parse.ParseUtil - Error parsing
http://zh.wikipedia.org/wiki/日语 with
org.apache.nutch.parse.tika.TikaParser@501ba94d
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError:
org.apache.tika.mime.MediaType.set([Lorg/apache/tika/mime/MediaType;)Ljava/util/Set;
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:162)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93)
at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:139)
Caused by: java.lang.NoSuchMethodError:
org.apache.tika.mime.MediaType.set([Lorg/apache/tika/mime/MediaType;)Ljava/util/Set;
at
org.apache.tika.parser.crypto.Pkcs7Parser.getSupportedTypes(Pkcs7Parser.java:52)
at
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
at
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:148)
at org.apache.tika.config.TikaConfig.getParser(TikaConfig.java:230)
at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:79)
at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35)
at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2012-07-20 10:15:51,067 WARN parse.ParseUtil - Unable to successfully parse
content http://zh.wikipedia.org/wiki/日语 of type application/xhtml+xml
{code}
> Upgrade to Tika 1.2
> -------------------
>
> Key: NUTCH-1433
> URL: https://issues.apache.org/jira/browse/NUTCH-1433
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1433-trunk.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira