Markus Jelsma created NUTCH-3107:
------------------------------------
Summary: QueryString normalizer to support per-host removal of
qstr params
Key: NUTCH-3107
URL: https://issues.apache.org/jira/browse/NUTCH-3107
Project: Nutch
Issue Type: Improvement
Components: urlnormalizer
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Fix For: 1.21
QueryString Normalizer now only does sorting of query string key/value pairs.
It could also support removal of per-host configurable keys.
Normally this can be done in normalizer regex, but having a few million XML
entries in the config parsed everytime, and millions of regular expressions
executed is not very convenient.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)