Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-02 Thread Drew Northup
On Mon, Oct 1, 2012 at 10:38 PM, Shawn Pearce spea...@spearce.org wrote:
 On Mon, Oct 1, 2012 at 3:18 PM, Jeff King p...@peff.net wrote:
 On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote:

 When libcurl fails to connect to an SSL server always retry the
 request once. Since the connection failed before the HTTP headers
 can be sent, no data has exchanged hands, so the remote side has
 not learned of the request and will not perform it twice.

 I find this a little distasteful just because we haven't figured out the
 actual _reason_ for the failure.

 No. I didn't try because I reproduced the issue on the initial GET
 /.../info/refs?service=git-upload-pack request with no authentication
 required. So the very first thing the remote-https process did was
 fail on an SSL error. During this run I was using a patched Git that
 had a different version of the retry logic, but it immediately retried
 and the retry was successful. At that point I decided the SSL session
 cache wasn't possibly relevant since the first request failed and the
 immediate retry was OK.

 Have you tried running your fails-after-a-few-hours request with other
 clients that don't have the problem and seeing what they do

 This is harder to reproduce than you think. It took me about 5 days of
 continuous polling to reproduce the error. And I have thus far only
 reproduced it against our production servers. This makes it very hard
 to test anything. Or to prove that any given patch is better than a
 different version.

The only sure way to make sure your patch works is to get your load
balancers Slashdotted first (reason noted in my previous mail on this
subject). For the sake of your relationship with your networking crew
I'd not advise doing that intentionally.


 which means it shouldn't really be affecting the general populace. So
 even though it feels like a dirty hack, at least it is self-contained,
 and it does fix a real-world problem. If your answer to the above
 questions is hunting this further is just not worth the effort, I can
 live with that.

 I am sort of at that point, but the hack is so ugly... yea, we
 shouldn't have to do this. Or pollute our code with it. I'm willing to
 go back and iterate on this further, but its going to be a while
 before I can provide any more information.

 How come the first hunk gets a nice for-loop and this one doesn't?

 Both hunks retry exactly once after an SSL connect error. I just tried
 to pick something reasonably clean to implement. This hunk seemed
 simple with the if, the other was uglier and a loop seemed the most
 simple way to get a retry in there.

If indeed the problem you are having is with a load balanced setup
then applying TCP/IP like back-off semantics is the right way to go.
The only reason the network stack isn't doing it for you is because
the load balancers wait for the SSL/TLS start before dumping the
excess (exceeding of license) SSL connections.

-- 
-Drew Northup
--
As opposed to vegetable or mineral error?
-John Pescatore, SANS NewsBites Vol. 12 Num. 59
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Shawn O. Pearce
From: Shawn O. Pearce spea...@spearce.org

When libcurl fails to connect to an SSL server always retry the
request once. Since the connection failed before the HTTP headers
can be sent, no data has exchanged hands, so the remote side has
not learned of the request and will not perform it twice.

In the wild we have seen git-remote-https fail to connect to
some load-balanced SSL servers sporadically, while modern popular
browsers (e.g. Firefox and Chromium) have no trouble with the same
server pool.

Lets assume the site operators (Hi Google!) have a clue and are
doing everything they already can to ensure secure, successful
SSL connections from a wide range of HTTP clients. Implementing a
single level of retry in the client can make it more robust against
transient failure modes.
---
 http.c| 19 ---
 remote-curl.c |  2 ++
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/http.c b/http.c
index 345c171..953f2e6 100644
--- a/http.c
+++ b/http.c
@@ -784,7 +784,7 @@ static int http_request(const char *url, void *result, int 
target, int options)
struct slot_results results;
struct curl_slist *headers = NULL;
struct strbuf buf = STRBUF_INIT;
-   int ret;
+   int ret, attempts;
 
slot = get_active_slot();
slot-results = results;
@@ -820,12 +820,17 @@ static int http_request(const char *url, void *result, 
int target, int options)
curl_easy_setopt(slot-curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(slot-curl, CURLOPT_ENCODING, gzip);
 
-   if (start_active_slot(slot)) {
-   run_active_slot(slot);
-   ret = handle_curl_result(slot);
-   } else {
-   error(Unable to start HTTP request for %s, url);
-   ret = HTTP_START_FAILED;
+   for (attempts = 0; attempts  2; attempts++) {
+   if (start_active_slot(slot)) {
+   run_active_slot(slot);
+   if (slot-results-curl_result == 
CURLE_SSL_CONNECT_ERROR)
+   continue;
+   ret = handle_curl_result(slot);
+   } else {
+   error(Unable to start HTTP request for %s, url);
+   ret = HTTP_START_FAILED;
+   }
+   break;
}
 
curl_slist_free_all(headers);
diff --git a/remote-curl.c b/remote-curl.c
index a269608..04a379c 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot)
 
slot-results = results;
slot-curl_result = curl_easy_perform(slot-curl);
+   if (slot-curl_result == CURLE_SSL_CONNECT_ERROR)
+   slot-curl_result = curl_easy_perform(slot-curl);
finish_active_slot(slot);
 
err = handle_curl_result(slot);
-- 
1.7.12.1.590.g4bb1bc4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Junio C Hamano
Shawn O. Pearce spea...@spearce.org writes:

 Lets assume the site operators (Hi Google!) have a clue and are
 doing everything they already can to ensure secure, successful
 SSL connections from a wide range of HTTP clients. Implementing a
 single level of retry in the client can make it more robust against
 transient failure modes.
 ---

Sign off?

  http.c| 19 ---
  remote-curl.c |  2 ++
  2 files changed, 14 insertions(+), 7 deletions(-)

 diff --git a/http.c b/http.c
 index 345c171..953f2e6 100644
 --- a/http.c
 +++ b/http.c
 @@ -784,7 +784,7 @@ static int http_request(const char *url, void *result, 
 int target, int options)
   struct slot_results results;
   struct curl_slist *headers = NULL;
   struct strbuf buf = STRBUF_INIT;
 - int ret;
 + int ret, attempts;
  
   slot = get_active_slot();
   slot-results = results;
 @@ -820,12 +820,17 @@ static int http_request(const char *url, void *result, 
 int target, int options)
   curl_easy_setopt(slot-curl, CURLOPT_HTTPHEADER, headers);
   curl_easy_setopt(slot-curl, CURLOPT_ENCODING, gzip);
  
 - if (start_active_slot(slot)) {
 - run_active_slot(slot);
 - ret = handle_curl_result(slot);
 - } else {
 - error(Unable to start HTTP request for %s, url);
 - ret = HTTP_START_FAILED;
 + for (attempts = 0; attempts  2; attempts++) {
 + if (start_active_slot(slot)) {
 + run_active_slot(slot);
 + if (slot-results-curl_result == 
 CURLE_SSL_CONNECT_ERROR)
 + continue;
 + ret = handle_curl_result(slot);
 + } else {
 + error(Unable to start HTTP request for %s, url);
 + ret = HTTP_START_FAILED;
 + }
 + break;
   }

Two naïve questions, that applies to this and the one in 
remote-curl.c::run_slot().

 (1) why only twice?
 (2) no need for wait a bit and then retry?

 diff --git a/remote-curl.c b/remote-curl.c
 index a269608..04a379c 100644
 --- a/remote-curl.c
 +++ b/remote-curl.c
 @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot)
  
   slot-results = results;
   slot-curl_result = curl_easy_perform(slot-curl);
 + if (slot-curl_result == CURLE_SSL_CONNECT_ERROR)
 + slot-curl_result = curl_easy_perform(slot-curl);
   finish_active_slot(slot);
  
   err = handle_curl_result(slot);
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Junio C Hamano
Shawn O. Pearce spea...@spearce.org writes:

 + for (attempts = 0; attempts  2; attempts++) {
 + if (start_active_slot(slot)) {
 + run_active_slot(slot);
 + if (slot-results-curl_result == 
 CURLE_SSL_CONNECT_ERROR)
 + continue;

Is it safe to continue and let start_active_slot() to add the same
curl handle again when USE_CURL_MULTI is in effect?

 + ret = handle_curl_result(slot);
 + } else {
 + error(Unable to start HTTP request for %s, url);
 + ret = HTTP_START_FAILED;
 + }
 + break;
   }
  
   curl_slist_free_all(headers);
 diff --git a/remote-curl.c b/remote-curl.c
 index a269608..04a379c 100644
 --- a/remote-curl.c
 +++ b/remote-curl.c
 @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot)
  
   slot-results = results;
   slot-curl_result = curl_easy_perform(slot-curl);
 + if (slot-curl_result == CURLE_SSL_CONNECT_ERROR)
 + slot-curl_result = curl_easy_perform(slot-curl);
   finish_active_slot(slot);
  
   err = handle_curl_result(slot);
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Jeff King
On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote:

 From: Shawn O. Pearce spea...@spearce.org
 
 When libcurl fails to connect to an SSL server always retry the
 request once. Since the connection failed before the HTTP headers
 can be sent, no data has exchanged hands, so the remote side has
 not learned of the request and will not perform it twice.
 
 In the wild we have seen git-remote-https fail to connect to
 some load-balanced SSL servers sporadically, while modern popular
 browsers (e.g. Firefox and Chromium) have no trouble with the same
 server pool.
 
 Lets assume the site operators (Hi Google!) have a clue and are
 doing everything they already can to ensure secure, successful
 SSL connections from a wide range of HTTP clients. Implementing a
 single level of retry in the client can make it more robust against
 transient failure modes.

I find this a little distasteful just because we haven't figured out the
actual _reason_ for the failure. That is, I'm not convinced this isn't
something that curl or the ssl library can't handle internally if we
would only configure them correctly. Did you ever follow up on tweaking
the session caching options for curl?

Have you tried running your fails-after-a-few-hours request with other
clients that don't have the problem and seeing what they do (I'm
thinking a small webkit harness or something would be the most
feasible)?

That being said, you did make it so that it only kicks in during ssl
connect errors:

 + for (attempts = 0; attempts  2; attempts++) {
 + if (start_active_slot(slot)) {
 + run_active_slot(slot);
 + if (slot-results-curl_result == 
 CURLE_SSL_CONNECT_ERROR)
 + continue;
 + ret = handle_curl_result(slot);
 + } else {
 + error(Unable to start HTTP request for %s, url);
 + ret = HTTP_START_FAILED;
 + }
 + break;

which means it shouldn't really be affecting the general populace. So
even though it feels like a dirty hack, at least it is self-contained,
and it does fix a real-world problem. If your answer to the above
questions is hunting this further is just not worth the effort, I can
live with that.

 diff --git a/remote-curl.c b/remote-curl.c
 index a269608..04a379c 100644
 --- a/remote-curl.c
 +++ b/remote-curl.c
 @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot)
  
   slot-results = results;
   slot-curl_result = curl_easy_perform(slot-curl);
 + if (slot-curl_result == CURLE_SSL_CONNECT_ERROR)
 + slot-curl_result = curl_easy_perform(slot-curl);
   finish_active_slot(slot);

How come the first hunk gets a nice for-loop and this one doesn't?

Also, are these hunks the only two spots where this error can come up?
The first one does http_request, which handles smart-http GET requests.
the second does run_slot, which handles smart-http POST requests.

Some of the dumb http fetches will go through http_request. But some
will not. And I think almost none of dumb http push will.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Jeff King
On Mon, Oct 01, 2012 at 02:53:17PM -0700, Junio C Hamano wrote:

 Shawn O. Pearce spea...@spearce.org writes:
 
  +   for (attempts = 0; attempts  2; attempts++) {
  +   if (start_active_slot(slot)) {
  +   run_active_slot(slot);
  +   if (slot-results-curl_result == 
  CURLE_SSL_CONNECT_ERROR)
  +   continue;
 
 Is it safe to continue and let start_active_slot() to add the same
 curl handle again when USE_CURL_MULTI is in effect?

I _think_ so. We reuse the slots anyway. So the usual workflow would be
get_active_slot, then start_active_slot, then run_active_slot. This loop
omits get_active_slot, which is responsible for (re-)initializing a
bunch of aspects of the slot. But we wouldn't want that here, since it
would mean we'd have to set up our URL, callbacks, etc, again.

My only worry would be that the failed curl request actually ended up
writing some data or made some other state change. But since we are
explicitly catching only ssl connection failures, presumably that would
not have happened.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Mon, Oct 01, 2012 at 02:53:17PM -0700, Junio C Hamano wrote:

 Shawn O. Pearce spea...@spearce.org writes:
 
  +  for (attempts = 0; attempts  2; attempts++) {
  +  if (start_active_slot(slot)) {
  +  run_active_slot(slot);
  +  if (slot-results-curl_result == 
  CURLE_SSL_CONNECT_ERROR)
  +  continue;
 
 Is it safe to continue and let start_active_slot() to add the same
 curl handle again when USE_CURL_MULTI is in effect?

 I _think_ so. 

It seems that at the beginning of curl_multi_add_handle() there is a
check to see if the incoming slot-curl has already been added to
some curl-multi-handle and the function would return an error code
CURLM_BAD_EASY_HANDLE without doing anything useful.  Doesn't the
second attempt to call start_active_slot() set the slot-in_use to
zero and return false, skipping the call to run_active_slot() in
that case?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Drew Northup
On Mon, Oct 1, 2012 at 5:23 PM, Shawn O. Pearce spea...@spearce.org wrote:
 From: Shawn O. Pearce spea...@spearce.org

 When libcurl fails to connect to an SSL server always retry the
 request once. Since the connection failed before the HTTP headers
 can be sent, no data has exchanged hands, so the remote side has
 not learned of the request and will not perform it twice.

 In the wild we have seen git-remote-https fail to connect to
 some load-balanced SSL servers sporadically, while modern popular
 browsers (e.g. Firefox and Chromium) have no trouble with the same
 server pool.

 Lets assume the site operators (Hi Google!) have a clue and are
 doing everything they already can to ensure secure, successful
 SSL connections from a wide range of HTTP clients. Implementing a
 single level of retry in the client can make it more robust against
 transient failure modes.

Ok, this begs for some background info...
@Dayjob one of the many things I do is mange our load balancers
(redundant pair in our case). If the attempted SSL connections in one
bin (time-slot) exceeds the licensed size of that bin then the
excess attempts are just dropped on the floor. Normal web browsers
detect this initial failure and try again. This may be implemented
internally—I haven't checked.

Google, as I am sure you are well aware, doesn't rely upon a
traditional L2/L3 network level load balancing architecture.
Therefore, I would not attempt to argue that the results that apply to
their systems would apply much of anywhere else. (They have done
presentations publicly, which are archived on the 'net, about how they
do things.)

-- 
-Drew Northup
--
As opposed to vegetable or mineral error?
-John Pescatore, SANS NewsBites Vol. 12 Num. 59
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Retry HTTP requests on SSL connect failures

2012-10-01 Thread Shawn Pearce
On Mon, Oct 1, 2012 at 3:18 PM, Jeff King p...@peff.net wrote:
 On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote:

 When libcurl fails to connect to an SSL server always retry the
 request once. Since the connection failed before the HTTP headers
 can be sent, no data has exchanged hands, so the remote side has
 not learned of the request and will not perform it twice.

 I find this a little distasteful just because we haven't figured out the
 actual _reason_ for the failure. That is, I'm not convinced this isn't
 something that curl or the ssl library can't handle internally if we
 would only configure them correctly. Did you ever follow up on tweaking
 the session caching options for curl?

No. I didn't try because I reproduced the issue on the initial GET
/.../info/refs?service=git-upload-pack request with no authentication
required. So the very first thing the remote-https process did was
fail on an SSL error. During this run I was using a patched Git that
had a different version of the retry logic, but it immediately retried
and the retry was successful. At that point I decided the SSL session
cache wasn't possibly relevant since the first request failed and the
immediate retry was OK.

 Have you tried running your fails-after-a-few-hours request with other
 clients that don't have the problem and seeing what they do

This is harder to reproduce than you think. It took me about 5 days of
continuous polling to reproduce the error. And I have thus far only
reproduced it against our production servers. This makes it very hard
to test anything. Or to prove that any given patch is better than a
different version.

 (I'm
 thinking a small webkit harness or something would be the most
 feasible)?

So I suspect the contrib/persistent-https proxy thing in Go actually
papers over this problem by having the Go SSL client handle the
connection. But this is only based on a test I ran for several days
through that proxy that did not reproduce the bug. This doesn't mean
it doesn't reproduce with the proxy, it just means _I_ didn't get
lucky with an error in a ~48 hour run.

 which means it shouldn't really be affecting the general populace. So
 even though it feels like a dirty hack, at least it is self-contained,
 and it does fix a real-world problem. If your answer to the above
 questions is hunting this further is just not worth the effort, I can
 live with that.

I am sort of at that point, but the hack is so ugly... yea, we
shouldn't have to do this. Or pollute our code with it. I'm willing to
go back and iterate on this further, but its going to be a while
before I can provide any more information.

 diff --git a/remote-curl.c b/remote-curl.c
 index a269608..04a379c 100644
 --- a/remote-curl.c
 +++ b/remote-curl.c
 @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot)

   slot-results = results;
   slot-curl_result = curl_easy_perform(slot-curl);
 + if (slot-curl_result == CURLE_SSL_CONNECT_ERROR)
 + slot-curl_result = curl_easy_perform(slot-curl);
   finish_active_slot(slot);

 How come the first hunk gets a nice for-loop and this one doesn't?

Both hunks retry exactly once after an SSL connect error. I just tried
to pick something reasonably clean to implement. This hunk seemed
simple with the if, the other was uglier and a loop seemed the most
simple way to get a retry in there.

 Also, are these hunks the only two spots where this error can come up?
 The first one does http_request, which handles smart-http GET requests.
 the second does run_slot, which handles smart-http POST requests.

Grrr. I thought I caught all of the curl perform calls but I guess I
missed the dumb transport.

 Some of the dumb http fetches will go through http_request. But some
 will not. And I think almost none of dumb http push will.

Well, don't use those? :-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html