Re: Fixture loading using bulk_create

Anssi Kääriäinen Mon, 12 Sep 2011 06:42:02 -0700


On Sep 12, 3:38 pm, "Jonas H." <jo...@lophus.org> wrote:
> On 09/12/2011 12:15 AM, Anssi Kääriäinen wrote:
>
> > The feature could be useful if there are users loading big fixture
> > files regularly. Otherwise it complicates fixture loading for little
> > gain.
>
> Maybe we could simply add an option to the loaddata command -- so that
> if someone really needs tons of fixtures for their tests it's possible
> to profit from bulk insertions by manually invoking loaddata from their
> test code. And the implementation is quite simple:
>
> http://paste.pocoo.org/show/474602/(doesn't cover all edge-cases yet)
>
> I did some benchmarking with this code and it speeds up fixture loading
> *a lot*:http://www.chartgo.com/get.do?id=bdfe6af778(chunksize=0 does
> not use `bulk_create` but `save`, and the speedups seen for chunksize=1
> is because `bulk_create` is used, thus avoiding `save` overhead)
>
> Jonas


I like this idea much better than trying to hack loaddata to use
bulk_create while maintaining compatibility with the current code.

The hard limitations would be as follows:
  - There must not be any updates.

Then there are limitations which could be lifted later on:

  - No natural keys (or the targets of the natural keys must exists in
the DB). I think this could be lifted later on - the dumped objects
are ordered in a way that natural keys do not form circles - just save
the objects in the same order and resolve the natural keys when saving
- not when deserializing.

  - Inherited models must be saved using the normal way. This could be
lifted: make bulk_create insert inherited objects if they have PK set
trusting that the user will insert the base objects in the same
transaction, or that they are already present. That is, create a
similar "raw" mode for bulk create that exists for Model.save_base.

  - Objects with M2M data are saved the normal way. This could be
improved later on, so that m2m data would also be bulk saved along
with the objects.

  - All objects must be loaded into memory: this is easy to lift, just
flush the collected objects once per N objects. I am not sure of this,
but you probably can flush the collected objects also once you find
out a new class - the objects are serialized class at a time.

  - Signals aren't sent at all. It is easy to batch send the signals
if wanted.

My version of the patch solves all those cases in a way compatible
with the current implementation. The biggest difference to your
version is that my version can be used when running tests - but the
speed difference for Django's test suite is somewhere around 2-3%. The
cost is some added complexity, and one select per batch to see which
PKs are already in the DB and which ones not. So it seems there is not
much point for the added complexity.

The most difficult problem is that my patch _will_ break some users
fixture loading due to the SQL length / parameter amount limitations
of different backends. This is hard to solve cleanly. For example
SQLite3 seems to have a 999 parameter limitation, so that you can save
333 three field models, 99 ten field models or just 10 hundred field
models. If you have a bulk_create flag, then the backwards
incompatibility is not a problem.

So, in summary, it seems having a bulk_create flag is the only way
forward.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Fixture loading using bulk_create

Reply via email to