[ https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Groh updated NUTCH-436: ------------------------------ Description: If you have a base URL of the form: http://a/b/c/d;p?q#f Embedded URL: ?y Correct Absolute URL: http://a/b/c/d;p?y Nutch Generated URL: http://a/b/c/?y Embedded URL: ;x Correct Absolute URL: http://a/b/c/d;x Nutch Generated URL: http://a/b/c/;x See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example http://www.ietf.org/rfc/rfc1808.txt was: If you have a base URL of the form: http://a/b/c/d;p?q#f Embedded URL Correct Absolute URL Nutch Generated URL ?y http://a/b/c/d;p?y http://a/b/c/?y ;x http://a/b/c/d;x http://a/b/c/;x See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of steps, and section 5.1 for example http://www.ietf.org/rfc/rfc1808.txt > Incorrect handling of relative paths when the embedded URL path is empty > ------------------------------------------------------------------------ > > Key: NUTCH-436 > URL: https://issues.apache.org/jira/browse/NUTCH-436 > Project: Nutch > Issue Type: Bug > Components: fetcher > Reporter: Andrew Groh > Priority: Critical > > If you have a base URL of the form: > http://a/b/c/d;p?q#f > Embedded URL: ?y > Correct Absolute URL: http://a/b/c/d;p?y > Nutch Generated URL: http://a/b/c/?y > Embedded URL: ;x > Correct Absolute URL: http://a/b/c/d;x > Nutch Generated URL: http://a/b/c/;x > See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of > steps, and section 5.1 for example > http://www.ietf.org/rfc/rfc1808.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.