[ https://issues.apache.org/jira/browse/NUTCH-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669985#action_12669985 ]
Andrzej Bialecki commented on NUTCH-279: ----------------------------------------- Commited with some modifications. All patterns in this patch except one have been added in another commit, the remaining one (-S: ...) IMHO occurs too rarely and the pattern would be too incusive. The checking utility has been rewritten to follow a similar model like URLFilterChecker. > Additions for regex-normalize > ----------------------------- > > Key: NUTCH-279 > URL: https://issues.apache.org/jira/browse/NUTCH-279 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.8 > Reporter: Stefan Neufeind > Assignee: Andrzej Bialecki > Fix For: 1.0.0 > > Attachments: regex-normalize.patch, regex-normalize2.patch > > > Imho needed: > 1) Extend normalize-rules to commonly used session-id's etc. > 2) Ship a checker to check rules easily by hand -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.