Hello,

Recently I had a problem with wget. This application of mine has data
spread over several HTTP/1.1 servers that know about the others. A
client can query any of the servers, if the server doesn't have the
information it will know which other server has the information and it
will return an HTTP redirect to the URL on the other server. Some of
the queries use POST to specify parameters to which data they want,
but those are also subject to receiving HTTP redirects, in which case
the POST should be repeated on the next server.

Usually after an HTTP redirect the client will repeat the query with a
GET, that's the Post/Redirect/Get pattern
(http://en.wikipedia.org/wiki/Post/Redirect/Get) used by web forms to
send the user to another web page instead of generating the HTML
content on the submit URL. To solve this ambiguity, HTTP/1.1
introduced status code 307 that indicates that the server expects the
client to try the next URL but using the same method (see
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection,
RFC2616 is the one that defines this new code, but unfortunately I
found it to be not very explicit about this behaviour). So those
redirects I was referring to above are all implemented as 307
Redirects.

When using an HTML form in Firefox this works just fine, but I was
trying to automate it and I noticed that wget doesn't work with that.
I tried curl and saw that curl handles the 307 Redirects correctly, so
for the time I had to resort to using curl to implement my scripts for
now, which is not ideal since wget is my tool of choice...

So I decided to fix the issue in wget, to make it behave like both
Firefox and curl, and to respect the "spirit" (if not the "letter") of
the RFC.

Attached to this e-mail you will find a patch created over the latest
(at this time) wget 1.12-2443 from the bzr repository.

Also attached to this e-mail there is a tarball with some files to
help test the issue. The wget307test.tgz file should be unpacked
directly under /var/www (or whatever the Apache root htdocs directory
is). There's an .htaccess that will set up all that needs setting up
as long as "AllowOverride all" is set for that directory in the main
Apache config file. actual.cgi is a Perl CGI that will receive the
redirected requests (redirect from /wget307test/redirect.cgi with code
307 also implemented inside .htaccess) and test if it worked or not.
testform.html is a form that can be used to test the submit to an URL
that will return a 307 Redirect from a web browser such as Firefox.
testcurl.sh is a shell script that will do the test using curl (I
tested it with curl 7.19.7 and it works). testwget.sh is a shell
script that will do the test using wget (I tested it with vanilla 1.12
or even unpatched 1.12-2443 from bzr and it does not work). The output
of the CGI (which is what each test displays) is textual and will
print a line indicating if the test worked or not based on a submitted
parameter (that will be lost if the POST was translated to a GET as in
wget's case). It will also print another submitted variable (a little
sanity check for the CGI) and which method (GET or POST) was used for
the request to the CGI.

I also updated the documentation (wget.texi used to generate all
others including man page) and the ChangeLog, but I may have forgotten
something, feel free to change anything in the patch that you feel
could be done better.

I hope this helps, and I really hope to see this fix included in the
next official release of wget! :-D

Keep up the great work building this awesome web client tool!

Cheers,
Filipe
=== modified file 'ChangeLog'
--- ChangeLog	2010-10-24 19:45:30 +0000
+++ ChangeLog	2010-11-20 05:58:09 +0000
@@ -1,3 +1,7 @@
+2010-11-20  Filipe Brandenburger <[email protected]>
+
+	* Respect HTTP/1.1 307 redirect code, by preserving same request method (POST).
+
 2010-10-24  Jessica McKellar <[email protected]> (tiny change)
 
 	* NEWS: Mention the change to the the summary for recursive downloads.

=== modified file 'NEWS'
--- NEWS	2010-11-19 17:26:14 +0000
+++ NEWS	2010-11-20 06:05:56 +0000
@@ -24,6 +24,8 @@
 
 ** Print diagnostic messages to stderr, not stdout.
 
+** Support HTTP/1.1 307 redirects keep request method.
+
 ** Do not use an additional HEAD request when --content-disposition is used,
    but use directly GET.
 

=== modified file 'doc/wget.texi'
--- doc/wget.texi	2010-10-28 22:20:31 +0000
+++ doc/wget.texi	2010-11-20 06:04:04 +0000
@@ -1467,12 +1467,12 @@
 can't know that until it receives a response, which in turn requires the
 request to have been completed -- a chicken-and-egg problem.
 
-Note: if Wget is redirected after the POST request is completed, it
-will not send the POST data to the redirected URL.  This is because
-URLs that process POST often respond with a redirection to a regular
-page, which does not desire or accept POST.  It is not completely
-clear that this behavior is optimal; if it doesn't work out, it might
-be changed in the future.
+Note: if Wget is redirected with an HTTP status code other than 307
+after the POST request is completed, it will not send the POST data
+to the redirected URL.  This is because URLs that process POST often
+respond with a redirection to a regular page, which does not desire
+or accept POST.  To explicitely request a POST after a redirect, an
+HTTP/1.1 compliant server should return a 307 redirect status code.
 
 This example shows how to log to a server using POST and then proceed to
 download the desired pages, presumably only accessible to authorized

=== modified file 'src/http.c'
--- src/http.c	2010-11-19 16:14:21 +0000
+++ src/http.c	2010-11-20 05:56:31 +0000
@@ -2319,6 +2319,15 @@
             CLOSE_INVALIDATE (sock);
           xfree_null (type);
           xfree (head);
+          /* From RFC2616: The status codes 303 and 307 have
+             been added for servers that wish to make unambiguously
+             clear which kind of reaction is expected of the client.
+             
+             A 307 should be redirected using the same method,
+             in other words, a POST should be preserved and not
+             converted to a GET in that case. */
+          if (statcode == HTTP_STATUS_TEMPORARY_REDIRECT)
+            return NEWLOCATION_KEEP_POST;
           return NEWLOCATION;
         }
     }
@@ -2798,6 +2807,7 @@
           ret = err;
           goto exit;
         case NEWLOCATION:
+        case NEWLOCATION_KEEP_POST:
           /* Return the new location to the caller.  */
           if (!*newloc)
             {
@@ -2808,7 +2818,7 @@
             }
           else
             {
-              ret = NEWLOCATION;
+              ret = err;
             }
           goto exit;
         case RETRUNNEEDED:

=== modified file 'src/retr.c'
--- src/retr.c	2010-10-21 11:27:31 +0000
+++ src/retr.c	2010-11-20 05:50:57 +0000
@@ -763,7 +763,7 @@
       proxy_url = NULL;
     }
 
-  location_changed = (result == NEWLOCATION);
+  location_changed = (result == NEWLOCATION || result == NEWLOCATION_KEEP_POST);
   if (location_changed)
     {
       char *construced_newloc;
@@ -837,12 +837,17 @@
         }
       u = newloc_parsed;
 
-      /* If we're being redirected from POST, we don't want to POST
+      /* If we're being redirected from POST, and we received a
+         redirect code different than 307, we don't want to POST
          again.  Many requests answer POST with a redirection to an
          index page; that redirection is clearly a GET.  We "suspend"
          POST data for the duration of the redirections, and restore
-         it when we're done. */
-      if (!post_data_suspended)
+         it when we're done.
+	 
+	 RFC2616 HTTP/1.1 introduces code 307 Temporary Redirect
+	 specifically to preserve the method of the request.
+	 */
+      if (result != NEWLOCATION_KEEP_POST && !post_data_suspended)
         SUSPEND_POST_DATA;
 
       goto redirected;

=== modified file 'src/wget.h'
--- src/wget.h	2010-09-29 11:34:09 +0000
+++ src/wget.h	2010-11-20 03:46:19 +0000
@@ -352,7 +352,7 @@
   PROXERR,
   /* 50  */
   AUTHFAILED, QUOTEXC, WRITEFAILED, SSLINITFAILED, VERIFCERTERR,
-  UNLINKERR
+  UNLINKERR, NEWLOCATION_KEEP_POST
 } uerr_t;
 
 /* 2005-02-19 SMS.

Attachment: wget307test.tgz
Description: GNU Zip compressed data

Reply via email to