Yes. I have run into this before. Mongrel will error on an invalid HTTP URI, with one common case being characters not properly escaped, which is what your example is. When one of the developers of my app brought this up before, he was told by the Mongrel developer that this was intentional, and would not be changed.

I didn't like this then, and I don't like it now, for a variety of reasons, including that my app needs to respond to URLs sent by third parties that are not under my control. Perhaps the current mongrel developers (IS there even any active development on mongrel?) have a different opinion, and this could be changed, or made configurable.

In the meantime, I have gotten around it with some mod_rewrite rules in apache on top of mongrel, to take illegal URLs and escape/rewrite them to be legal. Except due to some weird (bugs?) in apache and mod_rewrite around escaping and difficulty of controlling escaping in the apache conf, I actually had to use an external perl file too. Here's what I do:

Apache conf, applying to mongrel urls (which in my setup are all urls on a given apache virtual host)

 RewriteEngine on
RewriteMap query_escape prg:/data/web/findit/Umlaut/distribution/script/rewrite_map.pl
 #RewriteLock /var/lock/subsys/apache.rewrite.lock
 RewriteCond %{query_string} ^(.*[\>\<].*)$
 RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE]

The rewrite_map.pl file:

 #!/usr/bin/perl
$| = 1; # Turn off buffering
 while (<STDIN>) {

       s/>/%3E/g;
       s/</%3C/g;
       s/\//%2F/g;
       s/\\/%5C/g;
       s/ /\+/g;
       print $_;
 }
##

Looks like I'm not actually escaping bare '%' chars, since i hadn't run into those before in the URLs I need to handle. It would be trickier to add a regexp for that, since you need to distinguish an improper % from an % that's actually part of an entity reference. Maybe something like:

   s/%([^A-F0-9]|$)([^A-F0-9]|$)/%25/g;

'/%25' would be a valid  URI path representing the % char. '/%' is not.

Hope this helps,

Jonathan


Robbie Allen wrote:
If you append an extra percent sign to a URL that gets passed to
mongrel, it will return a Bad Request error.  Kind of odd that
"http://localhost/%"; causes a "Bad Request" instead of a "Not Found"
error.

Here is the error from the mongrel log:
HTTP parse error, malformed request (127.0.0.1):
#<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.>

I'm using Nginx in front of mongrel.  I understand this is a bad URL,
but is there anyway to have mongrel ignore lone percent signs?  Or
perhaps a Nginx rewrite rule that will encode extraneous percent signs?

--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886 rochkind (at) jhu.edu
_______________________________________________
Mongrel-users mailing list
Mongrel-users@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-users

Reply via email to