On Oct 27, 2009, at 7:49 PM, eric casteleijn wrote:

We have the following setup:

2 near identical public facing django servers communicating with one couchdb server. The couchdb server is oauth authenticated and people can access it directly (well, through an apache proxy) if they have the tokens to do so. New users are signed up through these django servers, after which they add the user and their tokens to couchdb. (the user through a POST to _users and the tokens through PUTs to _config)

We see this failing a lot, now to the point where we think it fails all the time (since all those systems have separate logs not all of which we have access to, this is not trivial to piece together.)

The errors the API servers get back all look like these (the lines starting with '(500':

'2009-10-27 22:35:15,357 ERROR UbuntuOne.couch: failed to add ***** = 40693 to section [oauth_token_users] of local.ini:

(500, (u'timeout', u'{gen_server,call,\n [couch_config, \n {set,"oauth_token_users","*****","40693",true}]}'))'

'2009-10-27 22:35:20,399 ERROR UbuntuOne.couch: failed to add ***** = ***** to section [oauth_token_secrets] of local.ini:

(500, (u'timeout', u'{gen_server,call,\n [couch_config, \n {set,"oauth_token_secrets","*****",\n "*****", \n true}]}'))'

Corresponding errors in the couchdb.log look like:


My theory was that these writes to _config fail because the local.ini is somehow corrupted, but I can't access that file directly (since it has users' secrets) or copy it to my machine to test this theory, and helping someone who is allowed to see it look for anything weird is like searching for the proverbial needle in the haystack: we have lots of users, and users can have multiple tokens. Add to that the fact that you cannot ever delete a line from the .ini file (DELETEs against keys in _config just empty the value and leave a line like 'foo = \n'!

After speaking to Jan on the channel he proposed that it may be that the gen_server message inbox overflows and the gen_server times out.

Could that be, under high load, and how can we solve this? Can we increase the size of this inbox, or can we possibly have multiple processes handling the access? Whether it's high load or corruption or something else again, right now it looks like NO new tokens can be added, and hence no new users can use our system. In short: HALP!

Hi Eric, I think we all know the long term solution is to store oauth information in a DB instead of the config file. Barring that, in the short term some steps that can be taken to avoid these errors include

1) extending or disabling the couch_config gen_server timeout. The default is 5000 milliseconds. This is a one-line patch.

2) Writing to the .ini file asynchronously. The in-memory configuration state can sustain update rates that are orders (plural) of magnitude larger than the update rate for the .ini file itself. With a bit of work you could cook it so that you still didn't respond to the PUT /_config/... request until the update was actually written to the file, while at the same time freeing the config server to handle more requests.

In each case the response times for PUT/_config/... may become uncomfortably long, but at least you won't be serving 500s from couch. Best, Adam

Reply via email to