Markus Jelsma created NUTCH-3107: ------------------------------------ Summary: QueryString normalizer to support per-host removal of qstr params Key: NUTCH-3107 URL: https://issues.apache.org/jira/browse/NUTCH-3107 Project: Nutch Issue Type: Improvement Components: urlnormalizer Reporter: Markus Jelsma Assignee: Markus Jelsma Fix For: 1.21
QueryString Normalizer now only does sorting of query string key/value pairs. It could also support removal of per-host configurable keys. Normally this can be done in normalizer regex, but having a few million XML entries in the config parsed everytime, and millions of regular expressions executed is not very convenient. -- This message was sent by Atlassian Jira (v8.20.10#820010)