[ 
https://issues.apache.org/jira/browse/NUTCH-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926347#comment-17926347
 ] 

Markus Jelsma commented on NUTCH-3107:
--------------------------------------

Initial patch, it seems to work just fine so far. Thoughts? I'll probably be 
overlooking at least something.

> QueryString normalizer to support per-host removal of qstr params
> -----------------------------------------------------------------
>
>                 Key: NUTCH-3107
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3107
>             Project: Nutch
>          Issue Type: Improvement
>          Components: urlnormalizer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.21
>
>         Attachments: NUTCH-3107.patch
>
>
> QueryString Normalizer now only does sorting of query string key/value pairs. 
> It could also support removal of per-host configurable keys.
> Normally this can be done in normalizer regex, but having a few million XML 
> entries in the config parsed everytime, and millions of regular expressions 
> executed is not very convenient.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to