#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
               Reporter:  Andrew     |          Owner:  nobody
  Brown                              |
                   Type:             |         Status:  new
  Uncategorized                      |
              Component:  Database   |        Version:  master
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 I’m writing a non-web app that uses Django as the ORM and SQLite as the
 backend, and I have a need to iterate over large tables efficiently.
 Django’s documentation says Queryset.iterator() does not work on SQLite
 [#ref1 (1)] [#ref2 (2)] but I tried it anyways, and discovered that it
 works: results are not read into memory in entirety, but streamed from the
 database in chunks. I traced this to an apparent logic error in the
 SQLCompiler.execute_sql() method [#ref3 (3)] and the interpretation of the
 can_use_chunked_reads flag. More on this below.

 But I was curious why it didn’t crash or get some error from the SQLite
 library despite the Django documentation saying SQLite doesn’t support
 streaming queries. Careful reading of the SQLite documentation seems to
 indicate there’s no problem reading a query in one cursor and writing to
 the database (even the same table) in another. There’s a caveat about
 isolation [#ref4 (4)] to watch out for, but otherwise seems like a
 perfectly supported mode of operation.

 So I dug into Django’s history and I came upon Django bug #7411. This bug
 was written in June 2008. At the time, the latest version of SQLite was
 3.5.9, which didn’t support database commits interleaved with partially
 read cursors. So the workaround implemented was to read the entire query
 result into memory. The database feature flag “can_use_chunked_reads” was
 added, and this was set to False in the SQLite backend. Code was added to
 the SQLCompiler.execute_sql() method to wrap the result iterator in
 list(result) if that flag was false [3b37c8151a]. I call this a
 “workaround” because it’s working around a limitation SQLite had at the
 time.

 However, later that year SQLite version 3.6.5 was released, which added
 the ability to run COMMIT simultaneously with other read operations [#ref5
 (5)]. I was not able to reproduce bug #7411 using SQLite versions >=
 3.6.5. This version was released in November 2008.

 So about that bug in SQLCompiler.execute_sql(). From what I can tell,
 Django ticket #16614 committed a change [f3b7c05936] which introduced a
 logic bug to the relevant code in June 2016, reproduced below [#ref3 (3)]:

 {{{#!python
 if not chunked_fetch and not
 self.connection.features.can_use_chunked_reads:
     try:
         # If we are using non-chunked reads, we return the same data
         # structure as normally, but ensure it is all read into memory
         # before going any further. Use chunked_fetch if requested.
         return list(result)
     finally:
         # done with the cursor
         cursor.close()
 return result
 }}}

 Notice the condition will skip the if statement body if either
 chunked_fetch is True or if can_use_chunked_reads is True. Since calling
 queryset.iterator() sets chunked_fetch to True, the can_use_chunked_reads
 flag is ignored and the if statement skipped. I believe the “and” should
 be an “or”, which would return list(result) if the database doesn’t
 support chunked reads regardless of the chunked_fetch value.

 Regardless of that bug, I suggest the workaround be removed entirely since
 it hasn’t been necessary since 2008 and hasn’t been functional since 2016.
 If we need to support the can_use_chunked_reads flag for compatibility
 with custom and third-party database backends, then we can fix the logic
 error and set SQLite’s flag to True. Documentation should be updated
 accordingly.

 If there is a reason for keeping chunked reads disabled for SQLite (such
 as the SQLite caveats on isolation [#ref4 (4)], or needing to support
 older versions of SQLite <3.6.5), then the logic error should be fixed.

 I’m willing to put in a pull request, but since the situation is quite
 complicated (the workaround also inadvertently helped avoid a related bug
 in Python’s sqlite3 driver, [[https://bugs.python.org/issue10513|#10513]],
 fixed as of Python 2.7.13 and 3.5.3) I wanted to keep this report as short
 as I could to get some feedback first.

 [=#ref1 (1)] https://docs.djangoproject.com/en/2.0/ref/models/querysets
 /#without-server-side-cursors
 [=#ref2 (2)]
 
https://github.com/django/django/blob/2.0.7/django/db/backends/sqlite3/features.py#L9
 [=#ref3 (3)]
 
https://github.com/django/django/blob/2.0.7/django/db/models/sql/compiler.py#L1096
 [=#ref4 (4)] https://sqlite.org/isolation.html (see final paragraph “No
 Isolation Between Operations On The Same Database Connection”)
 [=#ref5 (5)] https://sqlite.org/releaselog/3_6_5.html

-- 
Ticket URL: <https://code.djangoproject.com/ticket/29563>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/050.6918a06e655a75340b8a97844eb0f4e2%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to