[
https://issues.apache.org/jira/browse/NUTCH-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637247#comment-13637247
]
lufeng commented on NUTCH-1562:
-------------------------------
Hi Julien, if someone define the scoring.filter.order like opic,depth filters
and these filters are not included in plugin.includes property, maybe forget
it. it will throw an exception like this.
{code:java}
java.lang.NullPointerException
at
org.apache.nutch.scoring.ScoringFilters.injectedScore(ScoringFilters.java:112)
at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:164)
at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:63)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2013-04-20 21:19:10,983 ERROR crawl.Injector - Injector: java.io.IOException:
Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1327)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
at org.apache.nutch.crawl.Injector.run(Injector.java:318)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Injector.main(Injector.java:308)
{code}
Should we consider this situation or not?
> Order of execution for scoring filters
> --------------------------------------
>
> Key: NUTCH-1562
> URL: https://issues.apache.org/jira/browse/NUTCH-1562
> Project: Nutch
> Issue Type: Bug
> Components: documentation
> Affects Versions: 1.6, 2.1
> Reporter: Julien Nioche
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1562-trunk.patch
>
>
> The documentation in nutch-default.xml states that :
> {quote}
> <property>
> <name>scoring.filter.order</name>
> <value></value>
> <description>The order in which scoring filters are applied.
> This may be left empty (in which case all available scoring
> filters will be applied in the order defined in plugin-includes
> and plugin-excludes), or a space separated list of implementation
> classes.
> </description>
> </property>
> {quote}
> however if no order is specified the filters are ordered randomly and not in
> the order defined in plugin-includes.
> The other *order parameters (e.g. urlfilter.order) have a different
> documentation and "are loaded and applied in system defined order" which
> corresponds to what the code does.
> The patch attached is for 1.x and puts the code in accordance with the
> documentation by ordering the filters according to the order of the plugins,
> which gives users more control without having to specify the classes
> explicitly in scoring.filter.order.
> We could extend the same idea to the other *order params.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira