Outlinks are not properly normalized
------------------------------------
Key: NUTCH-1174
URL: https://issues.apache.org/jira/browse/NUTCH-1174
Project: Nutch
Issue Type: Bug
Components: parser
Affects Versions: 1.3
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Fix For: 1.5
In ParseOutputFormat, the toUrl is read from Outlink and is processed. This
String object is filtered, normalized etc but the original Outlink object is
actually added. The normalized url in toUrl is not written back to the Outlink
object.
This issue adds a setUrl method to Outlink which is used in ParseOutputFormat
to overwrite the unnormalized url.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira