Re: [doctrine-user] Commit order calculation / Bulk Insert

[email protected] Sun, 19 Oct 2014 09:26:43 -0700

Hi,

I hope I can contribute with more explanatory responses here.

> The first question is: Why does doctrine sort the commit order by classes
> and not the actual entity graph it could generate by using the
information of
> UnitOfWork and `associationMappings`?

When we created the CommitOrderCalculator, Roman and I debated about what
would be most performatic to define commit order:
1- Consider purely known entities inside of UoW
2- Consider the known ClassMetadata

The reasons on why we chose option 2 are very simple:

- We are always considering a fixed set, not floating based on number of
known entities
- Large changesets (bulk operations for example) would take a critical
amount of time to be processed.

> However, after digging into Doctrine with my debugger, I saw Doctrine
generates for
> each entity a separate INSERT query, even if each identifier is known
using sequences in PostgreSQL.

We knew that from the start. We penalized bulk operations because most of
the time we're dealing with auto-generated keys which are required to be
mapped back to entities as their identifiers.
We could, however, determine the execution method based on class metadata
information (looking at its idGenerator type) and executing single or multi
statements. Problems we found were all related to how we could get back all
the IDs on a single statement. Remember that sequence based drivers may be
facing a concurrency intense execution and numeric sequence may not be
fully deterministic. I remember setting up an Oracle XE at home to do some
tests around that.
Other problem was the UoW complexity which would increase and degrade
overall optimized performance.

> I also discovered that when having a (optional) circular dependency
between entities
> Doctrine resolves those always with a 'INSERT without FK' value and later
a
> 'UPDATE SET FK=x' strategy, even when a circular dependency is not given
in *this*
> commit-round. (but of course it might happen in the next flush/commit
round).

CommitOrderCalculator is very loose on full round circular dependency. It
does not break if visited node is already IN_PROGRESS. That is the subtle
difference between
https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Internal/CommitOrderCalculator.php
and
https://github.com/doctrine/data-fixtures/blob/2.0/lib/Doctrine/Fixture/Sorter/TopologicalSorter.php
Funny enough, you're the first ever to slightly mention that.
Now, getting back to your point, because our CommitOrderCalculator is
loose, we had to trick the executor enforcing extraUpdates inside of UoW.
That was our decision based that most PHP app/developers never minded about
full dependency breakdown and could potentially reduce adoption of Doctrine
and also inumerous amount of bug reports.

> This is actual the result of not using a entities graph to resolve
dependencies but only
> the associationMappings and simple topological sorting.

Yes, that was the downside of choosing the deterministic amount of data to
process versus non-deterministic.

> Is there a concrete reason why Doctrine has chosen this particular way to
resolve
> relations/dependencies? I only wonder because this implementation does not
> utilize maximum performance of various databases using bulk insert etc,
but
> rather suffers actually more from the overhead it generates.

We really wanted to get back to bulk operations if possible, but as it
stands right now, only assigned id generation type could benefit from them.
Since it's the least used generator type, it not pay off the performance
hit we may get by adding an extra decoupling layer.

If you have any other questions, feel free to ask.

PS: Sorry for my late reply. I'm quite overloaded lately and your thread
almost got missed.

Cheers,

On Fri, Oct 17, 2014 at 1:01 PM, Marco Pivetta <[email protected]> wrote:

> On 17 October 2014 00:04, Marc J. Schmidt <[email protected]> wrote:
>
>>
>> Is there a concrete reason why Doctrine has chosen this particular way to
>> resolve
>> relations/dependencies? I only wonder because this implementation does not
>> utilize maximum performance of various databases using bulk insert etc,
>> but
>> rather suffers actually more from the overhead it generates.
>>
>
> In addition to what Benjamin already said, we also currently lack a
> bulk-insert API (Steve Müller is working on it, but it won't hit 2.5).
>
> Marco Pivetta
>
> http://twitter.com/Ocramius
>
> http://ocramius.github.com/
>
>  --
> You received this message because you are subscribed to the Google Groups
> "doctrine-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/doctrine-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Guilherme Blanco
MSN: [email protected]
GTalk: guilhermeblanco
Toronto - ON/Canada

-- 
You received this message because you are subscribed to the Google Groups 
"doctrine-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/doctrine-user.
For more options, visit https://groups.google.com/d/optout.

Re: [doctrine-user] Commit order calculation / Bulk Insert

Reply via email to