>Number:         5263
>Category:       general
>Synopsis:       Appending a '/' followed by junk to a valid url does NOT cause 
>an error
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Fri Nov  5 14:40:00 PST 1999
>Last-Modified:
>Originator:     [EMAIL PROTECTED]
>Organization:
apache
>Release:        1.3.6
>Environment:
Solaris 7, fully patched.  sun4u sparc SUNW,Ultra-2
>Description:
My site is running Infoseek's UltraSEEK search engine.  In the process of its 
indexing, I noticed a problem in the way Apache parses URLs.  The best way to 
describe this is with an example.  This is a CORRECT URL:

     http://www.nara.gov/regional/seattle.html

However, due to a badly-coded link, the search engine tried urls like:

     http://www.nara.gov/regional/seattle.html/volunteer/contacts/seainfo.html
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
And oddly enough, these worked!  Apache always returned the original page 
(seattle.html), even with the "junk" appended to the URL (the ^underlined^ part 
of the URL).  I would have expected Apache to return a 404.  This is wreaking 
havoc with the search engine because it's going into infinite loops and 
indexing pages (like the seattle.html page) millions of times.  Also, this 
messes up what the browser thinks the document's base URL is, so images are 
broken on these pages.

This seems to happen even on your own site:

http://www.apache.org/index.html/blah/lasdkj/asdjhkjasjad
returns the same page as http://www.apache.org/index.html

I have also tried this with IIS and Netscape Enterprise Server--both of which 
return 404.
>How-To-Repeat:
(see full description for more examples)
     http://www.apache.org/index.html/blah/lasdkj/asdjhkjasjad 
returns the same page as 
     http://www.apache.org/index.html 
instead of an error.
>Fix:
My guess is that Apache is allowing certain delimiters after the .html that it 
shouldn't.  '?' obviously should work, but I don't think '/' should, at least 
not for non-CGI pages.
>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, you need]
[to include <[EMAIL PROTECTED]> in the Cc line and make sure the]
[subject line starts with the report component and number, with ]
[or without any 'Re:' prefixes (such as "general/1098:" or      ]
["Re: general/1098:").  If the subject doesn't match this       ]
[pattern, your message will be misfiled and ignored.  The       ]
["apbugs" address is not added to the Cc line of messages from  ]
[the database automatically because of the potential for mail   ]
[loops.  If you do not include this Cc, your reply may be ig-   ]
[nored unless you are responding to an explicit request from a  ]
[developer.  Reply only with text; DO NOT SEND ATTACHMENTS!     ]



Reply via email to