#29563: SQLite and Queryset.iterator() support -------------------------------------+------------------------------------- Reporter: Andrew | Owner: nobody Brown | Type: | Status: new Uncategorized | Component: Database | Version: master layer (models, ORM) | Severity: Normal | Keywords: Triage Stage: | Has patch: 0 Unreviewed | Needs documentation: 0 | Needs tests: 0 Patch needs improvement: 0 | Easy pickings: 0 UI/UX: 0 | -------------------------------------+------------------------------------- I’m writing a non-web app that uses Django as the ORM and SQLite as the backend, and I have a need to iterate over large tables efficiently. Django’s documentation says Queryset.iterator() does not work on SQLite [#ref1 (1)] [#ref2 (2)] but I tried it anyways, and discovered that it works: results are not read into memory in entirety, but streamed from the database in chunks. I traced this to an apparent logic error in the SQLCompiler.execute_sql() method [#ref3 (3)] and the interpretation of the can_use_chunked_reads flag. More on this below.
But I was curious why it didn’t crash or get some error from the SQLite library despite the Django documentation saying SQLite doesn’t support streaming queries. Careful reading of the SQLite documentation seems to indicate there’s no problem reading a query in one cursor and writing to the database (even the same table) in another. There’s a caveat about isolation [#ref4 (4)] to watch out for, but otherwise seems like a perfectly supported mode of operation. So I dug into Django’s history and I came upon Django bug #7411. This bug was written in June 2008. At the time, the latest version of SQLite was 3.5.9, which didn’t support database commits interleaved with partially read cursors. So the workaround implemented was to read the entire query result into memory. The database feature flag “can_use_chunked_reads” was added, and this was set to False in the SQLite backend. Code was added to the SQLCompiler.execute_sql() method to wrap the result iterator in list(result) if that flag was false [3b37c8151a]. I call this a “workaround” because it’s working around a limitation SQLite had at the time. However, later that year SQLite version 3.6.5 was released, which added the ability to run COMMIT simultaneously with other read operations [#ref5 (5)]. I was not able to reproduce bug #7411 using SQLite versions >= 3.6.5. This version was released in November 2008. So about that bug in SQLCompiler.execute_sql(). From what I can tell, Django ticket #16614 committed a change [f3b7c05936] which introduced a logic bug to the relevant code in June 2016, reproduced below [#ref3 (3)]: {{{#!python if not chunked_fetch and not self.connection.features.can_use_chunked_reads: try: # If we are using non-chunked reads, we return the same data # structure as normally, but ensure it is all read into memory # before going any further. Use chunked_fetch if requested. return list(result) finally: # done with the cursor cursor.close() return result }}} Notice the condition will skip the if statement body if either chunked_fetch is True or if can_use_chunked_reads is True. Since calling queryset.iterator() sets chunked_fetch to True, the can_use_chunked_reads flag is ignored and the if statement skipped. I believe the “and” should be an “or”, which would return list(result) if the database doesn’t support chunked reads regardless of the chunked_fetch value. Regardless of that bug, I suggest the workaround be removed entirely since it hasn’t been necessary since 2008 and hasn’t been functional since 2016. If we need to support the can_use_chunked_reads flag for compatibility with custom and third-party database backends, then we can fix the logic error and set SQLite’s flag to True. Documentation should be updated accordingly. If there is a reason for keeping chunked reads disabled for SQLite (such as the SQLite caveats on isolation [#ref4 (4)], or needing to support older versions of SQLite <3.6.5), then the logic error should be fixed. I’m willing to put in a pull request, but since the situation is quite complicated (the workaround also inadvertently helped avoid a related bug in Python’s sqlite3 driver, [[https://bugs.python.org/issue10513|#10513]], fixed as of Python 2.7.13 and 3.5.3) I wanted to keep this report as short as I could to get some feedback first. [=#ref1 (1)] https://docs.djangoproject.com/en/2.0/ref/models/querysets /#without-server-side-cursors [=#ref2 (2)] https://github.com/django/django/blob/2.0.7/django/db/backends/sqlite3/features.py#L9 [=#ref3 (3)] https://github.com/django/django/blob/2.0.7/django/db/models/sql/compiler.py#L1096 [=#ref4 (4)] https://sqlite.org/isolation.html (see final paragraph “No Isolation Between Operations On The Same Database Connection”) [=#ref5 (5)] https://sqlite.org/releaselog/3_6_5.html -- Ticket URL: <https://code.djangoproject.com/ticket/29563> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/050.6918a06e655a75340b8a97844eb0f4e2%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.