OK, so the `BaseJettyTest` is written to redirect POST every 20 requests, and 
fail with 500 error every 10 requests that aren't redirected.

```java
            } else if (request.getMethod().equals("POST")) {
               // don't redirect large objects
               if (request.getContentLength() < 10240 && 
redirectEveryTwentyRequests(request, response))
                  return;
               if (failEveryTenRequests(request, response))
                  return;
```

Below, I'm saying I did stuff not as I hunted the code down, just lazy and 
figured it was probably me.

I wrote this to ensure we could follow redirects on POST as this could happen 
in amazon with virtual-hosted buckets.  I also wanted to ensure users could 
silently recover from 500s on POST as AWS query apis had many safe commands 
that were tunneled over POST.  Note this behavior has since screwed with 
rackspace in particular who have occasionally returned 500 *after* processing 
the POST!

It is probably more safe to retry a POST before it was fully sent (IOException) 
vs retrying based a server response code.

For the record, the only safe write is PUT (maybe PATCH in some cases): it may 
be entirely better to remove this contract or ensure it does *not* retry POST!  
This would imply adding logic at each api level to decide which commands are 
really safe even though they are presented as POST requests.  Note that this 
issue isn't black/white, so you can also see the [issue on okhttp wrt silent 
retries](https://github.com/square/okhttp/issues/258) (and disabling them).


Now back to these tests.  3 tests failed, right?  Notice all of them had 
`testPost` prefix.  The tests are written to ensure that input streams are not 
retried (as this ensures input stream is not rebuffered), while buffered 
content is retried.

  * `testPostAsInputStream` is written to fail on Exception vs retrying.  This 
test checked to see if it failed at least one of out 5 requests and it didn't.  
Now, BaseJettyTest is written to fail one out of every 10 POSTs, so this could 
be more just bad luck.  At any rate, this test could be rewritten to send a 
flag on POST which jetty will always fail on vs relying on random failures or 
redirect roulette.
  * `testPostBinder` and `testPostContentLanguage` failed with an EOF error on 
`POST http://localhost:8124/`.  The port 8124 hints that these failures were 
after a redirect.  I'd focus on redirect.

If I had to rewrite these tests, which were crafted in 2009, today.  I'd use 
MockWebServer and isolate only the behaviors we want, not relying on state like 
how many other tests may have POST'ed something, etc.  At the cost of firing up 
one (or 2 in the case of redirects) MWS per test, flakiness like this could be 
immediately identifiable regardless of how they are run.  Well worth a refactor 
imho.  I'd probably also remove the logic to retry unsafe http methods on 500.  
You can look at 
[feign](https://github.com/Netflix/feign/blob/master/core/src/test/java/feign/FeignTest.java#L414)
 for some opinions of retries I've had recently, ex. retrying on failed ssl 
handshake.

Suspicions about these failures: It is possible that okhttp is retrying the 
POST that is expected to fail.  It is possible that the mechanics of a redirect 
are misaligned.  See if you can narrow down with MWS tests.  Here's an [example 
of a 2 server 
test](https://github.com/Netflix/feign/blob/master/ribbon/src/test/java/feign/ribbon/LoadBalancingTargetTest.java#L40),
 if you want to isolate redirects.  Caution.. these are guesses! The real 
problems could be elsewhere.

Anyway, I hope these hints help!

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/232#issuecomment-30581270

Reply via email to