[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144099#comment-13144099
]
Chris A. Mattmann commented on NUTCH-1098:
------------------------------------------
Guys: let's change the tone of this issue, OK?
Radim, thanks for your patch. Sorry that it didn't get applied or that folks
tried to engage in feedback/discussion with you on it. I would encourage you to
not get discouraged and I appreciate your effort in trying to contribute to the
Apache Nutch project.
The committers are the ones that have to figure out how to maintain things and
sometimes we get hung up on yes I'll agree less important issues. I'm going to
recommend that everyone just table those at the moment and that we move forward
here.
Here are some concrete next steps:
1. Ferdy: is it possible to commit a portion of this patch that you do
understand? Then we could leave the part that you don't uncommitted. This has 2
immediate goals:
- gives Radim a good feeling for contributing to the project -- he deserves
that.
- gives us the ability to cherry pick what we understand and are willing to
maintain
2. Radim: if you want to help in improving the formatting and other requested
issues, great. If you don't then that's fine too. At that point though the
maintenance/evolution of the patch will transition more into the Nutch folks
and you might not be as involved with it unless you get on board with what the
guys have decided are their code formatting and patch generation guidelines.
Thanks!
> better url-normalizer basic
> ---------------------------
>
> Key: NUTCH-1098
> URL: https://issues.apache.org/jira/browse/NUTCH-1098
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.3
> Environment: Any
> Reporter: Radim Kolar
> Assignee: Markus Jelsma
> Labels: encoding, url
> Fix For: 1.5
>
> Attachments: patch-with-utf8-encoding.diff
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do
> not expect space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding
> duplicates
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira