I still haven't made any progress on this issue.

From debugging the raw packets, it appears Safari is reseting the connection after sending the complete request but before accepting the response, then it resends the request. On Linux other users report the problems with Firefox, but I don't think it happens on Firefox from OS X for anyone.

With Safari running the CouchDB 0.7.2 test suite, everything worked perfectly. That version of CouchDB uses the Inets HTTP library.

-Damien


On Jul 23, 2008, at 3:01 PM, Damien Katz wrote:

Right now we are having a major problem with HTTP request being retried. This problem is responsible for the test suite failures seen constantly in Safari (though others report similar failures in Firefox, I've not seen them myself). And not just test suite failures, some are seeing the same behavior in production.

The major symptoms of this problem:
1. Mysterious conflict - You get a conflict error saving a document to the db. When you examine the existing db document, it's already got your changes. 2. Duplicate document - When creating a new document via POST, you occasionally get 2 new documents created instead of one.

#1 is annoying but not too serious, no data is lost or corrupted. #2 is a bit more dangerous, because you could consider the database corrupted by having the duplicate document. (depends on what problems it would cause for your app)

What is happening in both these cases is the HTTP requests are getting sent and processed twice. The first request is given to CouchDB and is handled, but when CouchDB attempts to send the response, the connection is reset (apparently). Then another identical HTTP request comes in and the request is processed again.

I am not a TCP expert. but by viewing the network requests via tcpdump, it is obvious the request packets, 1 header and 1 body packet, are getting resent from the client to the server. I do not know if the packets are being resent at the TCP level, or if the HTTP client in safari is retrying the request after getting a TCP error.

I do not know why the network error or subsequent resend is happening. I can only confirm that it *is* happening. If this is at the TCP level, then it means we definitely need to do away with the non-idempotent POST to create new documents.

I think we do anyway though. While this network error should not be happening, it did expose an interesting problem with our use of POST for document creation. The problem is the generated id for the document is a UUID generated server side, so the server has no way to distinguish if a request is a new request or a resend of an already processed request, and so generates another UUID and thus creates another new document. But if the UUID is generated by the client, then the resend will cause a conflict error, that UUID already exists in the DB, thus eliminating the duplicate data.

However, we still need to figure out why this is happening in the first place. Why is the connection being reset and why is the request being retried?

If anyone want to try to debug this, here is what I've been doing:
1. Run a packet sniffer for local port 5984 and start couchdb
2. Got to http://127.0.0.1/_utils/,  click the "Test Suite" link
3. Run the "basics" test manually until you see a "conflict error" exception in test result. (This exception stops the test executing. I don't try to debug other test failures, since the test keeps on running after the failure) 4. The last few requests will be the duplicated requests. There is information about the packets, but I don't know how to interpret it.

Any help and input appreciated.

-Damien

Reply via email to