[ https://issues.apache.org/jira/browse/NUTCH-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293336#comment-13293336 ]
Hudson commented on NUTCH-1385: ------------------------------- Integrated in Nutch-trunk #1868 (See [https://builds.apache.org/job/Nutch-trunk/1868/]) NUTCH-1385 More robust plug-in order properties in nutch-site.xml (Revision 1348764) Result = SUCCESS markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1348764 Files : * /nutch/trunk/CHANGES.txt * /nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilters.java * /nutch/trunk/src/java/org/apache/nutch/net/URLFilters.java * /nutch/trunk/src/java/org/apache/nutch/net/URLNormalizers.java * /nutch/trunk/src/java/org/apache/nutch/parse/HtmlParseFilters.java * /nutch/trunk/src/java/org/apache/nutch/scoring/ScoringFilters.java > More robust plug-in order properties in "nutch-site.xml" > -------------------------------------------------------- > > Key: NUTCH-1385 > URL: https://issues.apache.org/jira/browse/NUTCH-1385 > Project: Nutch > Issue Type: Improvement > Components: indexer, parser > Affects Versions: 1.5 > Reporter: Andy Xue > Assignee: Markus Jelsma > Priority: Minor > Labels: filter > Fix For: 1.6 > > Attachments: nutch-1385.txt > > > When listing multiple scoring filters in certain properties (listed below) in > "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in > front of the value content. > E.g.: > This is fine: > <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> > Either of these will generate an exception: > <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> > <value> > org.apache.nutch.scoring.opic.OPICScoringFilter > myFilter > </value> > Affects these properties in "nutch-site.xml": > * indexingfilter.order > * urlnormalizer.order > * urlfilter.order > * htmlparsefilter.order > * scoring.filter.order > Solution: replaced {order.split("\\s+")} to {order.trim().split("\\s+")}. > Patch provided. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira