Perhaps I need to give some some specific examples.

Example 1: we need to display the user some data, however this is slow, so
we try to cache it:

try:
   cache = Cache.objects.get(start=?, stop=?)
except Model.DoesNotExist:
   data = get_data(start=?,stop=?)
   cache = Cache.objects.create(start=?, stop=?, xxx=data.xxx,
yyy=data.yyy, ...)
[ render response using cache ]

So the first step I can do is make sure start and stop are uniquely
indexed. That way if it is run concurrently, the other processes will fail
rather then create multiple objects resulting in every request failing.
Still not very good from the user's perspective.

Ideally, as get_data is a db intensive operation I only want to call it
once for a given start/stop. Otherwise we use more resources then required.
Also I risk being vulnerable to DOS attacks if I get a lot of requests at
the same time (you could argue this is a problem anyway as the start and
stop come from the user).

I think I could change that to something like (if I understand celery
correctly):

from app.tasks import get_data
try:
  cache = Cache.objects.get(start=?, stop=?)
except Model.DoesNotExist:
   cache = Cache.objects.create(start=?, stop=?)
   cache.task = get_data.delay()
   cache.save()
   # cache.xxx and cache.yyy to be filled in by celery task

if cache.task is not None and not task.ready():
  [ render processing message ]
else:
  [ render response using cache ]

However, unfortunately, I still have the same race condition.


Example 2: I have a photo database that accepts imports from JavaScript.
The JavaScript will send a POST request for every file to be uploaded, with
the randomly generated name of the album to upload the photo to. At the
first step it does:

Album.objects.get_or_create(name=?)

There is an issue I haven't investigated yet with the JavaScript that for
the first upload it will upload the first two files concurrently, despite
the fact I configured it to only allow one at a time. Regardless, being
able to support concurrent uploads is probably a desirable feature.

I can't create a unique index here on name, I don't consider it an error to
have two album's with the same name.

Regardless, I don't want uploads to randomly fail either.

Am thinking the solution here is that I need to make sure that the album is
created before the first upload, and maybe even reference it in the POST
request by id rather then name.


Example 3: Creating new user. User puts in a request for an account.
Administrator has to approve the request. If two administrators approve the
same request at the same time, we could end up with two accounts for the
same user. Ooops. Or an error if some unique index caught, say, the
duplicate username or email address.


I guess I really to think about minimize the risks, as opposed to total
extermination of all possible race conditions. Instead focus on ensuring
that the database integrity  and that possible damage (e.g. duplicate
records) is minimised.
_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to