Hi Lewis [Moved to dev@]
We could normalise before filtering in the mapper indeed. Whether this is accidental or on purpose is not clear. PLease open a JIRA for this. On a different subject, do you think you could take care of doing the RC2 for trunk? I saw that you did some work on it and I assume that Chris too busy to do it Thanks Julien > When working on some patches for both trunk and Nutchgora branch I > ended up doing some code analysis of the generator mappers [0] & [1] > respectively. With specific reference to the code blocks in trunk > (lines 175 - 185) and Nutchgora branch (lines 57 - 74) where in trunk > we initially check if filter is true whereas in Nutchgora we check > whether normalize is true, then check whether filter is true before > proceeding to catch any nasties... it seems to me that there may be a > bug in trunk but I am not sure and would like someone to comment. > > Thanks > > Lewis > > [0] > https://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?view=markup > [1] > https://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorMapper.java?view=markup > > -- > Lewis > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

