#18557: get_or_create() causes a race condition with MySQL
-------------------------------+--------------------------------------
Reporter: foxwhisper | Owner: nobody
Type: Uncategorized | Status: reopened
Component: Core (URLs) | Version: 1.4
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by foxwhisper):
* status: closed => reopened
* resolution: wontfix =>
Comment:
@akaariai You raise a very good point about it breaking transaction
control, and enabling such a flag would depend on the developer ensuring
they knew exactly what they were doing, and when it is safe to use it.
I'm thinking something along these lines:
{{{
MyModel.objects.get_or_create(a=1, b=2, force_commit=True)
}}}
Along with a documentation update that says:
{{{
Certain databases (such as MySQL) don't gracefully handle get_or_create()
when multiple threads are being used to write
to the same table. If throughput is high enough, then there is a small
race condition where the MySQL index says the
unique index exists, but any attempt to fetch that key will result in
failure.
The only way around this is to commit the transaction you are in, which
then allows you to fetch the row. However, if
your get_or_create() is in a transaction block with manual commits, then
any queries before the get_or_create() call
will also be committed.
If you plan on using this feature, you must ensure that the
get_or_create() call is within a safe context where it is
okay for the previous queries to be committed.
}}}
Also, to touch on your question about what kind of usage pattern leads to
this race condition, it's fairly easy to trigger. You just need two or
more threads attempting to perform get_or_create() on the same table,
within a close space of each other. A typical scenario could be a queued
import job which has to do a get_or_create() on a popular item such as IP
address. If both scripts encounter the same IP at the same time, it will
cause the race condition to happen. The reason this affected us so badly,
is because the majority of our work involves importing and mangling large
data sets - where as a low traffic site would almost never see this
happen.
If the above suggested patch description sounds good, please let me know
and I'll get a patch prepared.
--
Ticket URL: <https://code.djangoproject.com/ticket/18557#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.