Hi,

I'm the ticket OP.

As a user, I definitely expect that my ORM won't eat my records when I save 
something, even with concurrency. Furthermore, I expect it to work with the 
default configuration of the backend, or if I have to change it, the ORM 
should refuse to work until the configuration is right, or at least display 
a huge warning.

I agree with Karen that documenting that some functions don't work is not 
helping. Especially since this is Django's fault, not MySQL's. I mean, 
sure, MySQL has some very weird isolation modes, I definitely agree with 
Shai on that, but an ORM's job is to know all that, and to use the database 
in such ways that it is safe. I reckon those safe ways exist, and that the 
problem in Django is not the MySQL default isolation level, but the 
DELETE-SELECT-INSERT way of updating M2Ms. Incidentally, to prove my point, 
this has been changed in Django 1.9 and data-loss doesn't happen anymore, 
in that same default isolation level.

It seems that setting a READ COMMITTED default level would help in that 
particular schedule of operations that is described in the ticket, but I 
disagree that it will work always and not cause problems elsewhere. For 
example, given a different schedule of the same operations needed to update 
a M2M, (DELETE-SELECT-INSERT), under READ COMMITTED, data-loss could still 
happen (see attachment).

Granted, this would be less of a problem, because it would (I think) 
require a very close synchronicity of the two sets of operations. This 
requirement is not needed in the ticket's case. If we refer to the 
step-by-step in the ticket, this isn't evident from it, but between the two 
SELECTs in sessions A and B, and the DELETE in session A, a "great" deal of 
time happens. This is because (if I understand correctly) a transaction is 
started to update each of the model's fields, and that is that single 
transaction that includes the update to the M2M field. Thus, the 
requirement is only that those two long transactions are synchronized.

By a matter of fact, this means that the bigger the model, the easier it 
will be to trigger the bug. I tried to replicate it in a minimal toy 
project, and I couldn't have enough synchronicity to replicate it (I only 
tried by double-clicking fast on the save button in the admin site, 
though). On my production project, I can reproduce the bug every time.

In my opinion, the fix should not be to set a different default isolation 
level, as that could trigger other problems that may be very hard to find. 
I was lucky to find this one. I think that Django should find sequences of 
operations that are proven to be safe under specific isolation levels. 
Those may need to be tailored for each backend. Maybe for PostgreSQL, that 
DELETE-SELECT-INSERT is safe, but it simply isn't in MySQL, neither in 
REPEATABLE READ nor READ COMMITTED. Thus this sequence could be used for 
PostgreSQL, but another one should be found for MySQL.

Le lundi 21 mars 2016 09:41:00 UTC+1, Shai Berger a écrit :
>
> My recommendation to backport is based on the observation that the 
> peculiar REPEATABLE READ behavior is highly conductive to data loss in the 
> presence of concurrency, combined with a sense that it is not very well 
> known; I find it much more likely that the change will fix broken code than 
> break really working code. 
>
> On 21 במרץ 2016 09:59:25 GMT+02:00, "Anssi Kääriäinen" <akaa...@gmail.com 
> <javascript:>> wrote:
>>
>> I'm strongly -1 on changing the default isolation level in a minor
>> release. We can recommend users switch the level and complain loudly
>> if they don't. But just changing the isolation level has potential for
>> breaking working code.
>>
>>  - Anssi
>>
>> On Mon, Mar 21, 2016 at 9:27 AM, Shai Berger <sh...@platonix.com 
>> <javascript:>> wrote:
>>
>>>  First of all, I would like to say that I strongly support the move to READ
>>>  COMITTED, including backporting it to 1.8.x.
>>>
>>>  But we also need to explain: REPEATABLE READ is a higher transaction 
>>> isolation
>>>  level than READ COMMITTED. If you have problematic code, it should lead to
>>>  more deadlocks and/or transactions failing at commit time (compared to READ
>>>  COMMITTED), not to data loss. The reason we get data losses is MySql's 
>>> unique
>>>  interpretation of REPEATABLE READ. If you're interested in the details 
>>> (and if
>>>  you use MySql, you should be), read on.
>>>
>>>  With MySql's REPEATABLE READ, the "read" operations -- SELECT statements --
>>>  indeed act like they act in the usual REPEATABLE READ: Once you've read 
>>> some
>>>  table, changes made to that table by other transactions will not be visible
>>>  within your transaction. But "write" operations -- UPDATE, DELETE, INSERT 
>>> and
>>>  the like -- act as if they're under READ COMMITTED, affecting (and 
>>> affected by)
>>>  changes committed by other transactions. The result is, essentially, that
>>>  within a transaction, the reads are not guaranteed to be consistent with 
>>> the
>>>  writes [1].
>>>
>>>  In particular, in the bug[2] that caused this discussion, we get the 
>>> following
>>>  behavior in one transaction:
>>>
>>>          (1) BEGIN TRANSACTION
>>>
>>>          (2) SELECT ... FROM some_table WHERE some_field=some_value
>>>                  (1 row returned)
>>>
>>>          (3) (some other transactions commit)
>>>
>>>          (4) SELECT ... FROM some_table WHERE some_field=some_value
>>>                  (1 row returned, same as above)
>>>
>>>          (5) DELETE some_table WHERE some_field=some_value
>>>                  (answer: 1 row deleted)
>>>
>>>          (6) SELECT ... FROM some_table WHERE some_field=some_value
>>>                  (1 row returned, same as above)
>>>
>>>          (7) COMMIT
>>>                  (the row that was returned earlier is no longer in the 
>>> database)
>>>
>>>  Take a minute to read this. Up to step (5), everything is as you would 
>>> expect;
>>>  you should find steps (6) and (7) quite surprising.
>>>
>>>  This happens because the other transactions in (3) deleted the row that is
>>>  returned in (2), (4) & (6), and inserted another one where
>>>  some_field=some_value; that other row is the row that was deleted in (5). 
>>> The
>>>  row that this transaction selects was not seen by the DELETE, and hence not
>>>  changed by it, and hence continues to be visible by the SELECTs in our
>>>  transaction. But when we commit, the row (which has been deleted) no longer
>>>  exists.
>>>
>>>  I have expressed elsewhere my opinion of this behavior as a general 
>>> database
>>>  feature, and feel no need to repeat it here; but I think that, if 
>>> possible, it
>>>  is Django's job as a framework to protect its users from it, at least as a
>>>  default.
>>>
>>>  On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
>>>
>>>>  What performance changes can you expect doing this change? It is probably
>>>>  that default on MySQL for a good reason.
>>>>
>>>
>>>  The Django project is usually willing to give up quite a lot of 
>>> performance in
>>>  order to prevent data losses. I agree that this default on MySql is 
>>> probably
>>>  for a reason, but I don't think it can be a good reason for Django.
>>>
>>>  Have fun,
>>>          Shai.
>>>
>>>  [1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
>>>  [2] https://code.djangoproject.com/ticket/26347
>>>
>>
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8eeb37ea-d924-4350-8ac9-7955fc3b5aa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: m2m_django_read_committed
Description: Binary data

Reply via email to