Re: Best Practices for structuring Query Abstractions with SQL / jOOQ

Lukas Eder Wed, 08 Oct 2025 16:02:14 -0700

You might find this useful:
https://blog.jooq.org/sql-join-or-exists-chances-are-youre-doing-it-wrong/


On Sat, Oct 4, 2025 at 10:05 PM 'Bernd Huber' via jOOQ User Group <
[email protected]> wrote:

> i still think i need to learn / relearn many SQL concepts, because i seem
> to be missing too much knowledge to write a filter/sort/paginate
> abstraction.
>
> For example:
> - i always used joins for filtering the main-table based on conditions on
> a sub-table, even though i read about "EXISTS" today, which i can use
> directly in conditions, and which will not lead to a cartesian product like
> joins do, and will just do what i need (filter -> at least one rec in
> subtable matches condition). This also helps to totally avoid groupBy
> - for sorting the main-table based on a field in a sub-table, i may still
> need to join this sub-table, but there is also potential to avoid cartesian
> product, because i may find a way to sort without joining.
>
> for example:
>
> DSL.exists(
>   DSL.selectOne().from(TOPICLANG)
>      .where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
>      .and(TOPICLANG.LANGCODE.eq("en"))
>      .and(TOPICLANG.TITLE.likeIgnoreCase("%" + text + "%"))
> );
>
> i think i will try to get myself into understanding how to filter/sort
> without joining tables explicitly.
> That should be a first good step for a better design.
>
> Bernd Huber schrieb am Freitag, 3. Oktober 2025 um 18:21:32 UTC:
>
>> Hello Lukas,
>>
>> thank you for the throughout answer!
>>
>> yes it was a mistake, i think i get the point now.
>> Thanks for the great performance blog entry as reference.
>>
>> i will go with following plan for collecting data (i think i now got this)
>> - manually running multiple queries per nest level and matching results
>> in the client (avoid duplicating results)
>> - possible to use multiset carefully per nest level to get results, where
>> no large-ish data sets are used
>> - avoid joins that create cartesian product
>>
>> i will go with following plan for filtering, sorting, paginating data on
>> a main table with conditions on nested sub-tables (can be multiple
>> nesting-levels)
>> - always group by main-table to avoid returning cartesian product
>> - never join tables that are not needed for filtering/sorting (only join
>> those tables really necessary)
>> - always use indexes on fields that are critical for filtering
>> - return only a list of ids of the main-table or a stream of ids to avoid
>> the situation you described ("Can you see the cartersian product for topic
>> 1...")
>>
>> For example (i still need to invest into seek-pagination instead of
>> offset/limit):
>> // only select the id of the main-table SelectJoinStep<Record1<T>>
>> selectFromStep = dsl() .select(idField) .from(table); // apply all joins
>> that the concrete implementation needs for applying filters/sorters on
>> sub-tables SelectJoinStep<Record1<T>> selectJoinedStep =
>> Joins.apply(selectFromStep, configJoins()); // apply all filters from
>> the given query and additional conditions of the concrete implementation
>> List<Condition> conditions = new ArrayList<>(queryJooqMapper.getFilters());
>> conditions.addAll(configConditions()); SelectConditionStep<Record1<T>>
>> selectConditionStep = selectJoinedStep.where(conditions); // group the
>> result by the id of the main-table and apply order/offset/limit return
>> selectConditionStep .groupBy(idField) .orderBy(queryJooqMapper.getSorter())
>> .offset(queryJooqMapper.getOffset()) .limit(queryJooqMapper.getLimit())
>> .fetch(idField);
>> 
>>
>> thanks again for the input!
>>
>> [email protected] schrieb am Freitag, 3. Oktober 2025 um 17:34:44 UTC+2:
>>
>>> Perhaps, you should have asked, before refactoring ;) This isn't a
>>> software design question, but just a simple mistake
>>>
>>> On Fri, Oct 3, 2025 at 5:07 PM 'Bernd Huber' via jOOQ User Group <
>>> [email protected]> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> jOOQ provides great power, and with great power comes great
>>>> responsibility :)
>>>>
>>>> I happily implemented a Repository Abstraction for my Team to use,  but
>>>> it currently shows it's limits that result in bad performance, which is
>>>> mainly resulting from my inability to understand specific performance
>>>> implications, when writing SQL for my target database (mariadb).
>>>>
>>>> I don't have a specific question but seek a bit for a guide / cookbook
>>>> for writing performant SQL (via jOOQ DSL) for the business use case of a
>>>> Repository with remote-paginated query-functionality (filter, sort, keyset
>>>> pagination, ...).
>>>>
>>>> My initial abstraction expected the users to write everything in one
>>>> big query, that uses multisets and joins. The joins are needed for
>>>> filtering etc, and the multisets to get the complex results. For example:
>>>>
>>>>
>>>> *Repository Query Abstraction - Part 1 (Performance Problems)*
>>>> @Dependent public class TopicRepository extends 
>>>> AbstractRepository<TopicDTO,
>>>> UUID> { @Inject TopicMapper topicMapper; @Inject TopicLangMapper
>>>> topicLangMapper; @Inject TopicTagLangMapper topicTagLangMapper; @Inject
>>>> LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper;
>>>> @Inject UserTopicAnswerMapper userTopicAnswerMapper; @Inject
>>>> UserTopicAnswerLangMapper userTopicAnswerLangMapper; private boolean
>>>> onlyUntagged = false; // Default constructor for CDI public
>>>> TopicRepository() { super(TOPIC.TOPICID); } public TopicRepository
>>>> onlyUntagged(boolean onlyUntagged) { this.onlyUntagged = onlyUntagged;
>>>> return this; } @Override protected SelectFinalStep<Record1<TopicDTO>>
>>>> prepareQuery(XQuery query) throws InvalidDataException { final
>>>> QueryJooqMapper queryJooqMapper = new QueryJooqMapper(query, TOPIC)
>>>> .addMappableFields(TOPIC) .addMappableFields(TOPICLANG)
>>>> .addMappableFields(TOPICLABEL) .addMappableFields(LABEL)
>>>> .addMappableFields(LABELLANG) .addMappableFields(USER)
>>>> .addMappableFields(TOPICTAGLANG) .addMappableFields(USERTOPICANSWER)
>>>> .addMappableFields(USERTOPICANSWERLANG); UUID userId = null; if 
>>>> (this.request()
>>>> instanceof UserRequestContext) { userId = ((UserRequestContext) 
>>>> this.request()).getUserId();
>>>> } final Condition onlyUntaggedCondition =
>>>> TOPIC.TAGSGENERATEDAT.isNull(); return dsl() .select( row( TOPIC,
>>>> multiset( select( LABEL, multiset( select(LABELLANG) .from(LABELLANG)
>>>> .where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec ->
>>>> labelLangMapper.map(rec))) ).from(TOPICLABEL)
>>>> .join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID))
>>>> .where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLabels")
>>>> .convertFrom(r -> r.map(rec -> labelMapper.map(rec))), multiset(
>>>> select(TOPICTAGLANG) .from(TOPICTAGLANG)
>>>> .where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetTags")
>>>> .convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))), multiset(
>>>> select(TOPICLANG) .from(TOPICLANG)
>>>> .where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLangs")
>>>> .convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))), row(
>>>> USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG)
>>>> .from(USERTOPICANSWERLANG)
>>>> .where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID))
>>>> ).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec)))
>>>> ).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r)) ).convertFrom(rec ->
>>>> topicMapper.map(rec)) ) .from(TOPIC)
>>>> .join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
>>>> .leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId)))
>>>> .where(queryJooqMapper.getFilters()) .and(onlyUntagged ?
>>>> onlyUntaggedCondition : DSL.trueCondition()) .groupBy(TOPIC.TOPICID)
>>>> .orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset())
>>>> .limit(queryJooqMapper.getLimit()); } }
>>>>
>>>> After testing a bit, i found that following two points are mainly
>>>> resulting in performance problems:
>>>> - many unbound joins (meaning that a join is made without further
>>>> filtering on the joined table)
>>>> - many multisets that mariadb does not seem to apply at the very end on
>>>> the already filtered results (unclear)
>>>>
>>>
>>> Why are you joining *and* collecting the child table in a subquery? Do
>>> either one thing or the other thing, but not both. When you look at any
>>> nested collection examples from jOOQ, you'll always find either one of
>>> these three approaches:
>>>
>>> 1. multiset() subqueries and (usually) no joins in the outermost query.
>>> If there are joins, they're typically to-one joins, not to-many joins
>>> 2. multsetAgg() on queries with only one to-many join path, not several
>>> branching paths
>>> 3. Collector based deduplicating and nesting of results on queries with
>>> only one to-many join path, not several branching paths
>>>
>>> In general (not in this case), if you're joining only one path of child
>>> tables, joins and multiset are indeed alternatives, and I've explored
>>> performance characteristics here:
>>>
>>> https://blog.jooq.org/the-performance-of-various-to-many-nesting-algorithms/
>>>
>>> But you shouldn't really join two child tables to one parent table in
>>> SQL (independently of jOOQ), or you're getting a cartesian product between
>>> TOPICLANG and USERTOPICANSER. Try it with a simple example, assuming
>>> PostgreSQL:
>>>
>>> with
>>>
>>> topic (id) as (values (1),(2),(3)),
>>>
>>> topic_lang (id, lang) as (values (1, 'en'), (1, 'de'), (2, 'en'), (3,
>>> 'de')),
>>>
>>> topic_user (id, name) as (values (1, 'Jon'), (1, 'Doe'), (2, 'Jon'), (2,
>>> 'Doe'))
>>>
>>> select *
>>>
>>> from topic as t
>>>
>>> join topic_lang as tl on t.id = tl.id
>>>
>>> left join topic_user as tu on t.id = tu.id
>>>
>>> This results in:
>>>
>>> |id |id |lang|id |name|
>>> |---|---|----|---|----|
>>> |1  |1  |de  |1  |Jon |
>>> |1  |1  |en  |1  |Jon |
>>> |1  |1  |de  |1  |Doe |
>>> |1  |1  |en  |1  |Doe |
>>> |2  |2  |en  |2  |Jon |
>>> |2  |2  |en  |2  |Doe |
>>> |3  |3  |de  |   |    |
>>>
>>> Can you see the cartesian product for topic 1? There are a total of 2x2
>>> rows. 2 for the languages x 2 for the users. This can result in a ton of
>>> rows, which are all unnecessary, because you're correlating each of these
>>> relations again in multiset derived tables *on each row*!
>>>
>>> I hope this helps,
>>> Lukas
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "jOOQ User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/jooq-user/4aa6dc70-1536-40e1-867b-44691d8e265bn%40googlegroups.com
> <https://groups.google.com/d/msgid/jooq-user/4aa6dc70-1536-40e1-867b-44691d8e265bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jooq-user/CAB4ELO5xGcK2nCJdZQ%2B6n-2TAJod%3DZX3CHai5Jk92H88--9sxg%40mail.gmail.com.

Re: Best Practices for structuring Query Abstractions with SQL / jOOQ

Reply via email to