Markus Jelsma created NUTCH-1711:
------------------------------------
Summary: Normalizer does not encode exclamation mark
Key: NUTCH-1711
URL: https://issues.apache.org/jira/browse/NUTCH-1711
Project: Nutch
Issue Type: Bug
Affects Versions: 1.7
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Fix For: 1.8
{code}
$ bin/nutch org.apache.nutch.net.URLNormalizerChecker
Checking combination of all URLNormalizers available
http://nutch.apache.org/bla!
http://nutch.apache.org/bla!
{code}
I never noticed that many URL encoders do not encode the exclamation mark until
just now. SolrCloud uses the character to delimit the composite ID in
SolrCloud, if you end with the exclamation mark, you will get an error!
Any thoughts on this?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)