Xavier BOURGOUIN created HTTPCLIENT-2341:
--------------------------------------------

             Summary: DefaultRedirect strategy breaks reserved chars in URI path
                 Key: HTTPCLIENT-2341
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2341
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient (classic)
    Affects Versions: 4.5.14
         Environment: httpclient4 (4.5.14)
Linux/Ubuntu 22.04
            Reporter: Xavier BOURGOUIN
         Attachments: hc4normalize.tar.gz

When an HTTP response as an URI in the Location header with percent-encoded 
reserved chars (such as %40), these chars are replaced by their normalized 
equivalent (which is "@" in the case of %40), which seems to contradict RFC  
3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the 
sense that for such reserved characters, their percent-encoded value doesn't 
have the same semantic meaning and thus aren't to be interpreted as equivalent.

One of the impacts is that it breaks any server / API that redirect clients to 
a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in 
the URI path (ex: location: https://<endpoint>/<some blob 
container>/foo%40bar.file)

 

Disabling URI normalization as show below seems to workaround it:
{code:java}
new 
HttpGet("http://service-that-redirects";).setConfig(RequestConfig.custom().setNormalizeUri(false).build())
 {code}
However I'm not sure that's satisfying, if as I suspect above it is just always 
wrong to "normalize" those reserved characters (plus it is enabled by default).

Note that httpclient5 is fine (the percent-encoded %40 is preserved as it 
should, and it seems there's no more toggle for the normalization behavior).
 
This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was 
discussing something very similar, except it was the other way around: some 
reserved characters were replaced by their percent-encoded equivalent. However 
in the the lengthy comment thread there, it seems a consensus was finally reach 
that for such chars, their percent-encoded value aren't equivalent to their 
original value and thus shouldn't be transformed. So I believe if that 
reasoning should be bijective, and thus should also apply to the case reported 
here.

I worked out a reproducer in the form of a little maven project that I'm 
attaching to this ticket, inspired from the one of that other ticket, that demo 
the issue for httpclient 4.5.14 (but probably all 4.x is the same), and 
compares it with httpclient5 (5.3.1). It should run directly with `mvn 
exec:java`.

 
In essence what it does is :
 * Start a dummy http server with two services: '{*}/foo{*}' that redirect to 
'{*}/foo%40bar{*}' and one that listen on '{*}foo@bar{*}'
 * Test httpclient4 (along with some other clients to demonstrate the 
differences in behavior) by sending some GET request toward '/foo' and observe 
if and how it follows the redirect toward 'foo@bar', which thus allows to 
observe whether *%40* was replaced by *@*

 
{code:java}
// Dummy server
public static void main(String[] args) throws IOException, InterruptedException 
{
        HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
        server.createContext("/foo", new RedirectHttpHandler());
        server.createContext("/foo@bar", new SuccessHttpHandler());
        server.setExecutor(null);
        server.start();
        server.stop(0);
       
       // [... test client requets]
}

public static class RedirectHttpHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            t.getResponseHeaders().add("Location", "/foo%40bar");
            t.sendResponseHeaders(302, 0);
            OutputStream os = t.getResponseBody();
            os.close();
        }
    }    
    
    public static class SuccessHttpHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange t) throws IOException {
            System.out.println("[server] Received GET with URI: " + 
t.getRequestURI().toString());
            String response = "You followed the redirect!";
            t.sendResponseHeaders(200, response.length());
            OutputStream os = t.getResponseBody();
            os.write(response.getBytes());
            os.close();
        }
    }
{code}
And httpclient4 test like this:
{code}
Unable to find source-code formatter for language: java. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlCloseableHttpClient client = HttpClients.createDefault();

HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo";);

CloseableHttpResponse response = client.execute(httpget);

if (response.getStatusLine().getStatusCode() == 302) {
    System.out.println("-> Location header: " + 
response.getFirstHeader("Location").getValue());
} else if (response.getStatusLine().getStatusCode() == 200) {
    System.out.println("-> Followed the redirect!");
} else {
    throw new RuntimeException("Unexpected response code: " + 
response.getStatusLine().getStatusCode());
}   
{code}
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to