Choice of solutions for #7411

Malcolm Tredinnick Mon, 14 Jul 2008 17:14:12 -0700

In essence, ticket #7411 says that for SQLite (only, amongst our core
back-ends), having a cursor that contains partially read results will
cause write-commit problems when saving something to the database. In
code, which might be easier to understand, something like this will be
problematic sometimes:


        for obj in big_queryset:
           # Do stuff to "obj"
           obj.save()
        
We read objects into the queryset a bunch of rows at a time, but not all
at once. Experience has shown that people are often lazy constructing
queries and then use the results only once and bail out without using
all the results once they've read enough, so the memory and performance
savings from the way we work now (as opposed to the way we worked prior
to queryset-refactor being merged) are significant in the large result
case.

I've just spent a while reading SQLite and PySQLite source code trying
to work out why this works as it does. I kind of understand it, but not
completely. However, importantly, the problem is real. It's present in
even the latest SQLite code and it appears to be intentional. So working
around this is important.

I don't think this should be a problem with "real" databases, since once
the cursor has constructed its result set, that shouldn't interact with
subsequent transactional work by another cursor. It's just something
about the way SQLite does its atomic commit implementation.

Possible Solutions
===================

(1) Do nothing. We could argue (which means "guess") that the situation
is rare enough that *if* you are saving as part of a read loop *and* you
think there will be lots of results (we can quantify the number) *and*
you are using SQLite, you should write

        for obj in list(big_queryset):
           ...
        
Problem here is that the people most likely to be affected are those
just starting out. I'm all in favour of both requiring the documentation
to be used by people and people knowing the tools before using them, but
this is subtle enough and hard enough to diagnose that it's going to
crop up.

(2) For SQLite *only*, we go back to the pre-queryset-refactor
behaviour: all results are read in immediately upon accessing the
queryset. It uses more memory immediately (if you iterate through the
queryset fully, however, the end result will be the same) and it makes
the first access slower, since we have to create Python objects for
every row in the result set immediately. However, if you're using
SQLite, it's because you have small data sets that would reasonably fit
into memory anyway (or you should write more tightly filtered
querysets).

Personally, I suspect #2 is the most pragmatic solution, but I'd be
interested in hearing other opinions before I commit something.

Malcolm



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Choice of solutions for #7411

Reply via email to