>Number: 5263 >Category: general >Synopsis: Appending a '/' followed by junk to a valid url does NOT cause >an error >Confidential: no >Severity: non-critical >Priority: medium >Responsible: apache >State: open >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Fri Nov 5 14:40:00 PST 1999 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.3.6 >Environment: Solaris 7, fully patched. sun4u sparc SUNW,Ultra-2 >Description: My site is running Infoseek's UltraSEEK search engine. In the process of its indexing, I noticed a problem in the way Apache parses URLs. The best way to describe this is with an example. This is a CORRECT URL:
http://www.nara.gov/regional/seattle.html However, due to a badly-coded link, the search engine tried urls like: http://www.nara.gov/regional/seattle.html/volunteer/contacts/seainfo.html ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ And oddly enough, these worked! Apache always returned the original page (seattle.html), even with the "junk" appended to the URL (the ^underlined^ part of the URL). I would have expected Apache to return a 404. This is wreaking havoc with the search engine because it's going into infinite loops and indexing pages (like the seattle.html page) millions of times. Also, this messes up what the browser thinks the document's base URL is, so images are broken on these pages. This seems to happen even on your own site: http://www.apache.org/index.html/blah/lasdkj/asdjhkjasjad returns the same page as http://www.apache.org/index.html I have also tried this with IIS and Netscape Enterprise Server--both of which return 404. >How-To-Repeat: (see full description for more examples) http://www.apache.org/index.html/blah/lasdkj/asdjhkjasjad returns the same page as http://www.apache.org/index.html instead of an error. >Fix: My guess is that Apache is allowing certain delimiters after the .html that it shouldn't. '?' obviously should work, but I don't think '/' should, at least not for non-CGI pages. >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, you need] [to include <[EMAIL PROTECTED]> in the Cc line and make sure the] [subject line starts with the report component and number, with ] [or without any 'Re:' prefixes (such as "general/1098:" or ] ["Re: general/1098:"). If the subject doesn't match this ] [pattern, your message will be misfiled and ignored. The ] ["apbugs" address is not added to the Cc line of messages from ] [the database automatically because of the potential for mail ] [loops. If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request from a ] [developer. Reply only with text; DO NOT SEND ATTACHMENTS! ]