[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142699#comment-13142699
]
Radim Kolar commented on NUTCH-1098:
------------------------------------
a/ Please direct your complains about quality of git generated patches to git
mailing list. i am not going to generate patches for you manually by running
diff -Naur
b/ if you used something better then SVN (hg,git,bzr) you can cherrypick
changes from my branch, create new branches for every subtask and attaching
branches to JIRA reports and then you can discuss them separately.
c/ more efficient is if i dont spend more 10x more time in pointless
discussions then on coding
> better url-normalizer basic
> ---------------------------
>
> Key: NUTCH-1098
> URL: https://issues.apache.org/jira/browse/NUTCH-1098
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.3
> Environment: Any
> Reporter: Radim Kolar
> Assignee: Markus Jelsma
> Labels: encoding, url
> Fix For: 1.5
>
> Attachments: patch-urlnormalizer.diff
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do
> not expect space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding
> duplicates
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira