I still haven't made any progress on this issue.
From debugging the raw packets, it appears Safari is reseting the
connection after sending the complete request but before accepting the
response, then it resends the request. On Linux other users report the
problems with Firefox, but I don't think it happens on Firefox from OS
X for anyone.
With Safari running the CouchDB 0.7.2 test suite, everything worked
perfectly. That version of CouchDB uses the Inets HTTP library.
-Damien
On Jul 23, 2008, at 3:01 PM, Damien Katz wrote:
Right now we are having a major problem with HTTP request being
retried. This problem is responsible for the test suite failures
seen constantly in Safari (though others report similar failures in
Firefox, I've not seen them myself). And not just test suite
failures, some are seeing the same behavior in production.
The major symptoms of this problem:
1. Mysterious conflict - You get a conflict error saving a document
to the db. When you examine the existing db document, it's already
got your changes.
2. Duplicate document - When creating a new document via POST, you
occasionally get 2 new documents created instead of one.
#1 is annoying but not too serious, no data is lost or corrupted. #2
is a bit more dangerous, because you could consider the database
corrupted by having the duplicate document. (depends on what
problems it would cause for your app)
What is happening in both these cases is the HTTP requests are
getting sent and processed twice. The first request is given to
CouchDB and is handled, but when CouchDB attempts to send the
response, the connection is reset (apparently). Then another
identical HTTP request comes in and the request is processed again.
I am not a TCP expert. but by viewing the network requests via
tcpdump, it is obvious the request packets, 1 header and 1 body
packet, are getting resent from the client to the server. I do not
know if the packets are being resent at the TCP level, or if the
HTTP client in safari is retrying the request after getting a TCP
error.
I do not know why the network error or subsequent resend is
happening. I can only confirm that it *is* happening. If this is at
the TCP level, then it means we definitely need to do away with the
non-idempotent POST to create new documents.
I think we do anyway though. While this network error should not be
happening, it did expose an interesting problem with our use of POST
for document creation. The problem is the generated id for the
document is a UUID generated server side, so the server has no way
to distinguish if a request is a new request or a resend of an
already processed request, and so generates another UUID and thus
creates another new document. But if the UUID is generated by the
client, then the resend will cause a conflict error, that UUID
already exists in the DB, thus eliminating the duplicate data.
However, we still need to figure out why this is happening in the
first place. Why is the connection being reset and why is the
request being retried?
If anyone want to try to debug this, here is what I've been doing:
1. Run a packet sniffer for local port 5984 and start couchdb
2. Got to http://127.0.0.1/_utils/, click the "Test Suite" link
3. Run the "basics" test manually until you see a "conflict error"
exception in test result. (This exception stops the test executing.
I don't try to debug other test failures, since the test keeps on
running after the failure)
4. The last few requests will be the duplicated requests. There is
information about the packets, but I don't know how to interpret it.
Any help and input appreciated.
-Damien