[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142291#comment-13142291
]
Ferdy Galema commented on NUTCH-1098:
-------------------------------------
@Markus/Radim
I certainly do not want to nitpick about patches, but I think feedback about
unnecessary changes or malformed patches should be given. Of course when
applying the patch you could simply ignore or correct them, but in the end
higher quality patches benefit all of us. It just makes the process of
reviewing/editing/committing a lot easier.
@Radim
Do you agree that "better url-normalizer basic" is perhaps overly broad? I can
probably think of tens of other improvements that fall under the scope of a
better basic urlnormalizer. Discussing / managing them in separate issues is
much more efficient than cramming them all into a single one.
Anyway this is not to undermine the effort of course. Keep up the good work!
(And feel free to disagree)
Cheers!
> better url-normalizer basic
> ---------------------------
>
> Key: NUTCH-1098
> URL: https://issues.apache.org/jira/browse/NUTCH-1098
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.3
> Environment: Any
> Reporter: Radim Kolar
> Assignee: Markus Jelsma
> Labels: encoding, url
> Fix For: 1.5
>
> Attachments: patch-urlnormalizer.diff
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do
> not expect space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding
> duplicates
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira