Re: Experiences with jOOQ / join deduplication

Lukas Eder Tue, 02 Dec 2014 02:14:30 -0800

I've browsed through the archives of discussions and reflections on this
matter and I now remember this piece. The LEFT OUTER JOIN approach to
denormalising relations prior to serialisation over the wire is actually
completely wrong if you want to model queries to an object graph in Java.
Consider the following Java "schema"


class Author {

    // The author name
    String name;

    // The books he's written
    Book[] books;

    // The films he's written
    Film[] films;
}


If you had LEFT OUTER JOINed both BOOK and FILM to the AUTHOR table, you'd
wind up with a cartesian product BOOK × FILM, which has no meaning at all.
LEFT OUTER JOIN seems to be useful at first sight when navigating down a
single acyclic path of relationships. As soon as you will encounter cycles
or multiple paths, denormalisation will prevent you from further pursuing
this solution approach.

There are essentially two solutions to this problem:

*1. Run multiple queries*

If I'm not mistaken, that's what Hibernate does. You can easily run a first
query for all the authors, and then run a second query to get all the books
and a third one to get all the films:

SELECT * FROM author WHERE [ some predicate ]
SELECT * FROM book WHERE author_id IN (SELECT id FROM author WHERE [ some
predicate ])
SELECT * FROM film WHERE author_id IN (SELECT id FROM author WHERE [ some
predicate ])


The above is optimal. Less optimal would be to pass in an IN-predicate list
of author IDs in both subsequent queries. Even worse would be N+1, i.e.
fetch each book/film for each author one record at a time.

*2. Do it the SQL way*

The SQL standard knows the concept of collection types, such as arrays,
sets, or multisets. These can be used to create nested sets, which would
really be the best match for object-graph queries.

SELECT
    a.name,
    MULTISET(SELECT * FROM book WHERE author_id = a.id) books
    MULTISET(SELECT * FROM film WHERE author_id = a.id) films
FROM
    author a
WHERE
    [ some predicate ]


Unfortunately, only few databases implement these operators, e.g. Oracle or
Informix. If they did, we would have never gone through the hassle of
implementing ORMs

2014-12-02 9:18 GMT+01:00 Lukas Eder <[email protected]>:

> Hi Oskar,
>
> I've left this discussion for others to answer, but it looks as though
> there hasn't been much feedback so far.
>
> We are aware of the lack of "join deduplication" support being a certain
> limitation to jOOQ (or to any SQL oriented approach). As I told you before
> off the list, the main reason for this problem is the fact that by using
> JOIN or LEFT JOIN, you are effectively denormalising your result set,
> bringing it into a tabular form that doesn't contain the full information
> of your original join path anymore. Automatic "deduplication" isn't really
> possible even if it appears to be in very simple cases.
>
> HQL and JPQL do not have this problem, because they are languages tailored
> precisely to navigate object graphs and you have no control over the number
> of SQL queries that are effectively generated under the hood in order to
> materialise such an object graph. While this can be very convenient, their
> approach leads to many other missing parts, e.g. UNIONs, FULL OUTER JOINs,
> CTE and derived tables and many other SQL features that can only be done
> with SQL, which again doesn't work well with second-level caching, which
> again is needed to navigate, and more importantly, persist the object graph
> efficiently.
>
> The way more SQL-oriented APIs usually work around this limitation is by
> providing synthetic join operations that are not really SQL joins. We've
> been looking into this kind of solution in the past but haven't come around
> to implementing it. For instance, LLBLGen calls these things "prefetch
> paths":
>
> http://www.llblgen.com/documentation/4.2/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/Adapter/gencode_prefetchpaths_adapter.htm
>
> The can of worms you refer to is probably inevitable. Consider the
> following section from the SQLAlchemy manual:
>
>
> The Importance of Ordering
>
> A query which makes use of subqueryload()
> <http://docs.sqlalchemy.org/en/rel_0_9/orm/loading.html#sqlalchemy.orm.subqueryload>
>  in
> conjunction with a limiting modifier such as Query.first()
> <http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.first>
> ,Query.limit()
> <http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.limit>,
> or Query.offset()
> <http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.offset>
>  should *always* include Query.order_by()
> <http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.order_by>
>  against
> unique column(s) such as the primary key, so that the additional queries
> emitted by subqueryload()
> <http://docs.sqlalchemy.org/en/rel_0_9/orm/loading.html#sqlalchemy.orm.subqueryload>
>  include
> the same ordering as used by the parent query. Without it, there is a
> chance that the inner query could return the wrong rows
>
>
> As soon as a framework adds synthetic clauses to the SQL language, those
> clauses will have a huge impact on all of the language. This is true for
> jOOQ more than for any of the mentioned SQL APIs, as jOOQ really models all
> of SQL as a language.
>
> Long story short, we have recognised this limitation and we're thinking
> about it and we also appreciate all feedback from the community with
> respect to possible solutions. But it is not at all an easy problem to
> solve thoroughly. As a library vendor, there's no such thing as a "not
> automatic but just something" solution, I'm afraid.
>
> Having said so, if there is a quick win where we could add some
> convenience API, that'll do of course. We're constantly adding new methods
> that do similar things as the existing ones but with less code.
>
> Regards,
> Lukas
>
> 2014-11-24 12:33 GMT+01:00 Oskar Norrback <[email protected]>:
>
>> Hello guys!
>>
>> We are a new startup and decided to try jOOQ instead of JPA in the
>> backend of our first product and we haven’t regretted it so far. We wrote a
>> blog post summarizing our experiences and thoughts with jOOQ here
>> <http://teonos.com/blog/java/development/2014/11/10/experiences-with-jOOQ.html>
>> .
>>
>> The biggest minus with using jOOQ so far has been the “join deduplication
>> boilerplate” issue I mention in the blog post.
>>
>> Does my suggestion of adding some basic association support between the
>> generated Record types make sense to anyone else? E.g. using my blog
>> example: that a CustomerRecord can hold a list of OrderRecords?
>>
>> The next step would then be that I could automagically convert a Result
>> of a CUSTOMER.leftJoin(ORDER) query into CustomerRecords with the orders
>> populated. Currently I have to write this logic myself, repeatedly in
>> slight variations and it feels like it's boilerplate that could perhaps
>> come with the library. Looks like at least some of the Python ActiveRecord
>> implementations support associations to some extent (SQLObject
>> <http://sqlobject.org/SQLObject.html#one-to-many-relationships>,
>> SQLAlchemy <http://docs.sqlalchemy.org/en/rel_0_9/orm/loading.html>).
>> I'm not looking for automatic fetching of associations (and the can of
>> worms that comes with), just something that would simplify the step of
>> converting JOIN results into Records with associations.
>>
>>
>> Regards,
>>
>> Oskar Norrback
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "jOOQ User Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Experiences with jOOQ / join deduplication

Reply via email to