It's a good point Markus. I would imagine that we would wish to do one of two things
1) Create a parser to fetch the contentType in question (not the aim of Nutch but geared more towards Tika contribution...) 2) As you mention, use a parser implementation which stores this contentType as false for parsing e.g. skip this contentType when it is encountered again. However are we not able to achieve this through use of an urlfilter which denies the .x-flv suffix? On Tue, Jan 3, 2012 at 5:18 PM, Markus Jelsma <[email protected]> wrote: > Hi, > > Right now the state of the crawldb is set to success for items without a > parser that throw: > > Exception in thread "main" org.apache.nutch.parse.ParseException: parser not > found for contentType=video/x-flv url= > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78) > at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) > > Should we do that at all? It doesn't seem right. I, for instance, am not > interested in retrying such an URL again for a very long time. > > Thoughts? > Thanks -- Lewis

