Markus Jelsma created NUTCH-3107:
------------------------------------

             Summary: QueryString normalizer to support per-host removal of 
qstr params
                 Key: NUTCH-3107
                 URL: https://issues.apache.org/jira/browse/NUTCH-3107
             Project: Nutch
          Issue Type: Improvement
          Components: urlnormalizer
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.21


QueryString Normalizer now only does sorting of query string key/value pairs. 
It could also support removal of per-host configurable keys.

Normally this can be done in normalizer regex, but having a few million XML 
entries in the config parsed everytime, and millions of regular expressions 
executed is not very convenient.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to