On Mon, Feb 16, 2009 at 7:42 PM, Konstantin S <ktechli...@gmail.com> wrote:
>
> Hello!
>
> I am trying to load really huge dataset (contains millions of records)
> into database through fixtures and it seems that django loads entire
> data into memory before commiting it into db so python process dies
> out of memory. Am I right that the only possible solution in my case
> is not to use fixtures at all and upload data by script ?

This is correct.

There are several reasons for this, most of which come back to the
fact that the serialization tools that form the basis of the fixture
loaders are primarily intended for use as test data, or for the
transfer of small amounts of data (such as records to service an AJAX
request). Serializing an entire multi-gigabyte database wasn't part of
the original design specification. The XML serializer is the only
serializer that uses an event-based API under the hood, but even the
XML serializer attempts to process the entire file before committing
to the database.

However, even if Django's serializers used an event-based API which
would allow for streamed loading, you would almost certainly find that
using an SQL script will be faster anyway. While the overhead imposed
by using the parsers and the Django ORM is not especially large, if
you are dealing with _huge_ datasets, even a small per-object overhead
will add up to a non-trivial difference in overall processing time.

Yours,
Russ Magee %-)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to