Hi,
Right now the state of the crawldb is set to success for items without a
parser that throw:
Exception in thread "main" org.apache.nutch.parse.ParseException: parser not
found for contentType=video/x-flv url=
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
Should we do that at all? It doesn't seem right. I, for instance, am not
interested in retrying such an URL again for a very long time.
Thoughts?
Thanks