#5423: "dumpdata" should stream output one row at a time
-------------------------------------+-------------------------------------
Reporter: adrian | Owner: ramiro
Type: New | Status: assigned
feature | Component: Core
Milestone: | (Serialization)
Version: SVN | Severity: Normal
Resolution: | Keywords: dumpdata fixtures
Triage Stage: Accepted | memory
Needs documentation: 0 | Has patch: 1
Patch needs improvement: 1 | Needs tests: 0
UI/UX: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------
Comment (by toofishes):
Replying to [comment:40 ramiro]:
> Unfortunately the patch isn't ready, some comments and further testing
showed that in the PostgreSQL case big ORM queries are still consuming too
mucho memory.
>
> I suspect the patch solves things in the Django side because it
implemenst usage of QuerySet.iterator(). But we still need to add to e.g.
outr PostgreSQL backend the ability to signal psycopg2/RDBMS it shouldn't
get all the queryset from the server, maybe using server-side cursors.
>
I'm going to respectfully disagree with your switch away from "Ready for
checkin".
1. This patch still improves the situation heavily, as it eliminates the
double caching problems that existed in both database drivers and the
Django querysets.
2. Looking into server side cursors, I was going to attempt to implement
this, but it turns out the Django ORM is built with a fundamental
limitation- that you have one connection with one cursor to a given
database, period. Take a look at `django/db/backends/__init__.py`, you
will see the cursor() method always returns the single cached cursor
object from the BaseDatabaseWrapper object. This works if every query is
fully executed and the cursor is available for immediate reuse, but with
any sort of iteration, you can't touch the shared cursor object and would
need to create a new one solely for that purpose. This brings with it a
whole new bag of worms- now you have to encapsulate the cursor and close
it when done, among other things, which is a non-trivial addition to the
current database code.
So in short, I think holding off on applying this is the wrong decision,
and another bug should be opened, related to #13869 and this one, that
adds full support for server side cursors and the management of them, or
their equivalent, in every database backend. Then it should be a simple 1
or 2 line adjustment to make dumpdata take advantage of this new feature.
--
Ticket URL: <https://code.djangoproject.com/ticket/5423#comment:45>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.