[ 
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142291#comment-13142291
 ] 

Ferdy Galema commented on NUTCH-1098:
-------------------------------------

@Markus/Radim

I certainly do not want to nitpick about patches, but I think feedback about 
unnecessary changes or malformed patches should be given. Of course when 
applying the patch you could simply ignore or correct them, but in the end 
higher quality patches benefit all of us. It just makes the process of 
reviewing/editing/committing a lot easier.

@Radim

Do you agree that "better url-normalizer basic" is perhaps overly broad? I can 
probably think of tens of other improvements that fall under the scope of a 
better basic urlnormalizer. Discussing / managing them in separate issues is 
much more efficient than cramming them all into a single one.

Anyway this is not to undermine the effort of course. Keep up the good work! 
(And feel free to disagree)

Cheers!
                
> better url-normalizer basic
> ---------------------------
>
>                 Key: NUTCH-1098
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1098
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Any
>            Reporter: Radim Kolar
>            Assignee: Markus Jelsma
>              Labels: encoding, url
>             Fix For: 1.5
>
>         Attachments: patch-urlnormalizer.diff
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do 
> not expect space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding 
> duplicates

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to