[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-952: ---------------------------------- Attachment: test_nutch_952.html Was fixed by NUTCH-797 for v 1.4 (2.x will follow soon). Example link (attached) works now for 1.8 (both with parse-html and parse-tika): {code} % nutch parsechecker http://localhost/test_nutch_952.html ... Outlinks: 1 outlink: toUrl: http://bbs.soso.com/search?w=ruby%20on%20rails&ty=c&sd=0 {code} > fix outlink which started with '?' in html parser > ------------------------------------------------- > > Key: NUTCH-952 > URL: https://issues.apache.org/jira/browse/NUTCH-952 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: nutchgora > Reporter: Stondet > Attachments: NUTCH-952-v2.patch, test_nutch_952.html > > > <a href="?w=ruby%20on%20rails&ty=c&sd=0" >ruby on rails</a>(a snippet from > http://bbs.soso.com/search?ty=c&sd=0&w=rails) > outlink parsed from above link: > http://bbs.soso.com/?w=ruby%20on%20rails&ty=c&sd=0 > but expected is http://bbs.soso.com/search?w=ruby%20on%20rails&ty=c&sd=0 -- This message was sent by Atlassian JIRA (v6.2#6252)