[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144160#comment-13144160
]
Radim Kolar commented on NUTCH-1098:
------------------------------------
If you are so clever and hard working then stop undeleting my patch and write
better one yourself. I am licensing my work as Affero GPL v3 from now.
You simply need months to discuss trivial code change. Everybody here claims to
be smart like TV and hard working like black men but look at your results: mere
13 trivial commits in October. Look at my results i have 2.1 billions files
indexed in 4 months.
I reworked major portion of nutch and i dont want to spend years waiting if
they and ever and when will be merged. I have hadoop 0.21 api, generator with
plugable algorithm, fixed building with maven, database backend switched to
cassandra and other stuff. For me is far better to just pull 20 yours patches
per month from github and dont waste my time with you in pointless discussions
like git vs svn diff format.
> better url-normalizer basic
> ---------------------------
>
> Key: NUTCH-1098
> URL: https://issues.apache.org/jira/browse/NUTCH-1098
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.3
> Environment: Any
> Reporter: Radim Kolar
> Assignee: Markus Jelsma
> Labels: encoding, url
> Fix For: 1.5
>
> Attachments: patch-with-utf8-encoding.diff
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do
> not expect space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding
> duplicates
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira