On Mon, Feb 16, 2009 at 7:42 PM, Konstantin S <ktechli...@gmail.com> wrote: > > Hello! > > I am trying to load really huge dataset (contains millions of records) > into database through fixtures and it seems that django loads entire > data into memory before commiting it into db so python process dies > out of memory. Am I right that the only possible solution in my case > is not to use fixtures at all and upload data by script ?
This is correct. There are several reasons for this, most of which come back to the fact that the serialization tools that form the basis of the fixture loaders are primarily intended for use as test data, or for the transfer of small amounts of data (such as records to service an AJAX request). Serializing an entire multi-gigabyte database wasn't part of the original design specification. The XML serializer is the only serializer that uses an event-based API under the hood, but even the XML serializer attempts to process the entire file before committing to the database. However, even if Django's serializers used an event-based API which would allow for streamed loading, you would almost certainly find that using an SQL script will be faster anyway. While the overhead imposed by using the parsers and the Django ORM is not especially large, if you are dealing with _huge_ datasets, even a small per-object overhead will add up to a non-trivial difference in overall processing time. Yours, Russ Magee %-) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---