Finally some good news on performance. After tweaking of the prefetch
strategies, I got the following test numbers on PostgreSQL, fetching/
prefetching a few thousands of objects (smaller number of milliseconds
means faster processing) :
(disjoint)
n:1 ... M6 ...... 51 ms
n:1 ... trunk ... 45 ms
(joint)
n:1 ... M6 ...... 100 ms
n:1 ... trunk ... 45 ms
(disjoint)
1:n ... M6 ...... 100 ms
1:n ... trunk ... 54 ms
(disjoint)
n:m ... M6 ...... 54 ms
n:m ... trunk ... 51 ms
So the trunk code significantly improves on 3.0M6 when prefetching to-
many and joint to-ones relationships, and somewhat improves on other
cases (within a margin of error I guess).
Andrus
On Sep 7, 2009, at 8:53 AM, Andrus Adamchik wrote:
Been thinking about the new prefetching model some more and found a
glaring performance hole - the most common N:1 prefetch case will
result in a cartesian product processing in memory. E.g. if one
Artist has 3 Paintings, and the Paintings are fetched with Artist
prefetch, the Artist DB data will be read repeatedly 3 times. The
result will be correct - 3 Paintings all pointing to a single Artist
object, however processing will be much slower.
Now will be making another pass over the code to restore the old
prefetch strategy for N:1 relationships. Hopefully the resulting
code will be tighter than it used to be.
Andrus
On Sep 6, 2009, at 9:43 PM, Andrus Adamchik wrote:
Good to have a little time again to hack Cayenne internals.
Just committed a pretty big change to the prefetching algorithm
motivated by CAY-1250 bug report. So combining prefetching and
inheritance now works 100%.
One visible effect of this change is that all disjoint prefetch
queries will now include the ID's of the source side of the
prefetch relationship and a mandatory join to the source entity. In
return for this small inefficiency (increased result set size...
hopefully most ID's are small), we get a bunch of benefits, main
one being the ability to process related fetched objects in a
consistent manner regardless of the relationship semantics (1..1,
1..N, N..M). This strategy was used before for flattened
relationships, now it is used for everything. On the other hand
this change allowed to optimize some related cases, so all in all,
there may be no performance penalty.
It is still possible to go back and optimize it further to prevent
the addition of the extra columns to the resultset in some cases
(e.g. if both joined FK and PK are present in the result, only
fetch one of them), I wish we could do that in some central
location (like SelectTranslator) instead of writing endless if/else
in the prefetch processing code.
Now the prefetch code is easier to make sense of, with fewer if/
else. And I am planning to refactor it further.
Also I came very close to fixing the biggest remaining limitation
of disjoint prefetching:
https://issues.apache.org/jira/browse/CAY-1025
Andrus