Re: [Django] #5423: "dumpdata" should stream output one row at a time

Django Wed, 10 Aug 2011 10:47:03 -0700

#5423: "dumpdata" should stream output one row at a time
-------------------------------------+-------------------------------------
               Reporter:  adrian     |          Owner:  ramiro
                   Type:  New        |         Status:  assigned
  feature                            |      Component:  Core
              Milestone:             |  (Serialization)
                Version:  SVN        |       Severity:  Normal
             Resolution:             |       Keywords:  dumpdata fixtures
           Triage Stage:  Accepted   |  memory
    Needs documentation:  0          |      Has patch:  1
Patch needs improvement:  1          |    Needs tests:  0
                  UI/UX:  0          |  Easy pickings:  0
-------------------------------------+-------------------------------------


Comment (by toofishes):

 Replying to [comment:40 ramiro]:
 > Unfortunately the patch isn't ready, some comments and further testing
 showed that in the PostgreSQL case big ORM queries are still consuming too
 mucho memory.
 >
 > I suspect the patch solves things in the Django side because it
 implemenst usage of QuerySet.iterator(). But we still need to add to e.g.
 outr PostgreSQL backend the ability to signal psycopg2/RDBMS it shouldn't
 get all the queryset from the server, maybe using server-side cursors.
 >

 I'm going to respectfully disagree with your switch away from "Ready for
 checkin".
 1. This patch still improves the situation heavily, as it eliminates the
 double caching problems that existed in both database drivers and the
 Django querysets.
 2. Looking into server side cursors, I was going to attempt to implement
 this, but it turns out the Django ORM is built with a fundamental
 limitation- that you have one connection with one cursor to a given
 database, period. Take a look at `django/db/backends/__init__.py`, you
 will see the cursor() method always returns the single cached cursor
 object from the BaseDatabaseWrapper object. This works if every query is
 fully executed and the cursor is available for immediate reuse, but with
 any sort of iteration, you can't touch the shared cursor object and would
 need to create a new one solely for that purpose. This brings with it a
 whole new bag of worms- now you have to encapsulate the cursor and close
 it when done, among other things, which is a non-trivial addition to the
 current database code.

 So in short, I think holding off on applying this is the wrong decision,
 and another bug should be opened, related to #13869 and this one, that
 adds full support for server side cursors and the management of them, or
 their equivalent, in every database backend. Then it should be a simple 1
 or 2 line adjustment to make dumpdata take advantage of this new feature.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/5423#comment:45>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Re: [Django] #5423: "dumpdata" should stream output one row at a time

Reply via email to