[ 
https://issues.apache.org/jira/browse/NUTCH-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-3107:
---------------------------------
    Description: 
QueryString Normalizer now only does sorting of query string key/value pairs. 
It could also support removal of per-host configurable keys.

Normally this can be done in normalizer regex, but having a few million XML 
entries in the config parsed everytime, and millions of regular expressions 
executed is not very convenient.

Updated patch also adds support for global ignorable params, and some other 
checks on query string keys.

  was:
QueryString Normalizer now only does sorting of query string key/value pairs. 
It could also support removal of per-host configurable keys.

Normally this can be done in normalizer regex, but having a few million XML 
entries in the config parsed everytime, and millions of regular expressions 
executed is not very convenient.

 


> QueryString normalizer to support per-host removal of qstr params
> -----------------------------------------------------------------
>
>                 Key: NUTCH-3107
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3107
>             Project: Nutch
>          Issue Type: Improvement
>          Components: urlnormalizer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.21
>
>         Attachments: NUTCH-3107-1.patch, NUTCH-3107.patch
>
>
> QueryString Normalizer now only does sorting of query string key/value pairs. 
> It could also support removal of per-host configurable keys.
> Normally this can be done in normalizer regex, but having a few million XML 
> entries in the config parsed everytime, and millions of regular expressions 
> executed is not very convenient.
> Updated patch also adds support for global ignorable params, and some other 
> checks on query string keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to