Hello Lukas,
thank you for the throughout answer!
yes it was a mistake, i think i get the point now.
Thanks for the great performance blog entry as reference.
i will go with following plan for collecting data (i think i now got this)
- manually running multiple queries per nest level and matching results in
the client (avoid duplicating results)
- possible to use multiset carefully per nest level to get results, where
no large-ish data sets are used
- avoid joins that create cartesian product
i will go with following plan for filtering, sorting, paginating data on a
main table with conditions on nested sub-tables (can be multiple
nesting-levels)
- always group by main-table to avoid returning cartesian product
- never join tables that are not needed for filtering/sorting (only join
those tables really necessary)
- always use indexes on fields that are critical for filtering
- return only a list of ids of the main-table or a stream of ids to avoid
the situation you described ("Can you see the cartersian product for topic
1...")
For example (i still need to invest into seek-pagination instead of
offset/limit):
// only select the id of the main-table SelectJoinStep<Record1<T>>
selectFromStep = dsl() .select(idField) .from(table); // apply all joins
that the concrete implementation needs for applying filters/sorters on
sub-tables SelectJoinStep<Record1<T>> selectJoinedStep =
Joins.apply(selectFromStep, configJoins()); // apply all filters from the
given query and additional conditions of the concrete implementation
List<Condition> conditions = new ArrayList<>(queryJooqMapper.getFilters());
conditions.addAll(configConditions()); SelectConditionStep<Record1<T>>
selectConditionStep = selectJoinedStep.where(conditions); // group the
result by the id of the main-table and apply order/offset/limit return
selectConditionStep .groupBy(idField) .orderBy(queryJooqMapper.getSorter())
.offset(queryJooqMapper.getOffset()) .limit(queryJooqMapper.getLimit())
.fetch(idField);
thanks again for the input!
[email protected] schrieb am Freitag, 3. Oktober 2025 um 17:34:44 UTC+2:
> Perhaps, you should have asked, before refactoring ;) This isn't a
> software design question, but just a simple mistake
>
> On Fri, Oct 3, 2025 at 5:07 PM 'Bernd Huber' via jOOQ User Group <
> [email protected]> wrote:
>
>> Hello guys,
>>
>> jOOQ provides great power, and with great power comes great
>> responsibility :)
>>
>> I happily implemented a Repository Abstraction for my Team to use, but
>> it currently shows it's limits that result in bad performance, which is
>> mainly resulting from my inability to understand specific performance
>> implications, when writing SQL for my target database (mariadb).
>>
>> I don't have a specific question but seek a bit for a guide / cookbook
>> for writing performant SQL (via jOOQ DSL) for the business use case of a
>> Repository with remote-paginated query-functionality (filter, sort, keyset
>> pagination, ...).
>>
>> My initial abstraction expected the users to write everything in one big
>> query, that uses multisets and joins. The joins are needed for filtering
>> etc, and the multisets to get the complex results. For example:
>>
>>
>> *Repository Query Abstraction - Part 1 (Performance Problems)*
>> @Dependent public class TopicRepository extends AbstractRepository<TopicDTO,
>> UUID> { @Inject TopicMapper topicMapper; @Inject TopicLangMapper
>> topicLangMapper; @Inject TopicTagLangMapper topicTagLangMapper; @Inject
>> LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper; @Inject
>> UserTopicAnswerMapper userTopicAnswerMapper; @Inject
>> UserTopicAnswerLangMapper userTopicAnswerLangMapper; private boolean
>> onlyUntagged = false; // Default constructor for CDI public
>> TopicRepository() { super(TOPIC.TOPICID); } public TopicRepository
>> onlyUntagged(boolean onlyUntagged) { this.onlyUntagged = onlyUntagged;
>> return this; } @Override protected SelectFinalStep<Record1<TopicDTO>>
>> prepareQuery(XQuery query) throws InvalidDataException { final
>> QueryJooqMapper queryJooqMapper = new QueryJooqMapper(query, TOPIC)
>> .addMappableFields(TOPIC) .addMappableFields(TOPICLANG)
>> .addMappableFields(TOPICLABEL) .addMappableFields(LABEL)
>> .addMappableFields(LABELLANG) .addMappableFields(USER)
>> .addMappableFields(TOPICTAGLANG) .addMappableFields(USERTOPICANSWER)
>> .addMappableFields(USERTOPICANSWERLANG); UUID userId = null; if
>> (this.request()
>> instanceof UserRequestContext) { userId = ((UserRequestContext)
>> this.request()).getUserId();
>> } final Condition onlyUntaggedCondition =
>> TOPIC.TAGSGENERATEDAT.isNull(); return dsl() .select( row( TOPIC,
>> multiset( select( LABEL, multiset( select(LABELLANG) .from(LABELLANG)
>> .where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec ->
>> labelLangMapper.map(rec))) ).from(TOPICLABEL)
>> .join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID))
>> .where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLabels")
>> .convertFrom(r -> r.map(rec -> labelMapper.map(rec))), multiset(
>> select(TOPICTAGLANG) .from(TOPICTAGLANG)
>> .where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetTags")
>> .convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))), multiset(
>> select(TOPICLANG) .from(TOPICLANG)
>> .where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLangs")
>> .convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))), row(
>> USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG)
>> .from(USERTOPICANSWERLANG)
>> .where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID))
>> ).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec)))
>> ).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r)) ).convertFrom(rec ->
>> topicMapper.map(rec)) ) .from(TOPIC)
>> .join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
>> .leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId)))
>>
>> .where(queryJooqMapper.getFilters()) .and(onlyUntagged ?
>> onlyUntaggedCondition : DSL.trueCondition()) .groupBy(TOPIC.TOPICID)
>> .orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset())
>> .limit(queryJooqMapper.getLimit()); } }
>>
>> After testing a bit, i found that following two points are mainly
>> resulting in performance problems:
>> - many unbound joins (meaning that a join is made without further
>> filtering on the joined table)
>> - many multisets that mariadb does not seem to apply at the very end on
>> the already filtered results (unclear)
>>
>
> Why are you joining *and* collecting the child table in a subquery? Do
> either one thing or the other thing, but not both. When you look at any
> nested collection examples from jOOQ, you'll always find either one of
> these three approaches:
>
> 1. multiset() subqueries and (usually) no joins in the outermost query. If
> there are joins, they're typically to-one joins, not to-many joins
> 2. multsetAgg() on queries with only one to-many join path, not several
> branching paths
> 3. Collector based deduplicating and nesting of results on queries with
> only one to-many join path, not several branching paths
>
> In general (not in this case), if you're joining only one path of child
> tables, joins and multiset are indeed alternatives, and I've explored
> performance characteristics here:
>
> https://blog.jooq.org/the-performance-of-various-to-many-nesting-algorithms/
>
> But you shouldn't really join two child tables to one parent table in SQL
> (independently of jOOQ), or you're getting a cartesian product between
> TOPICLANG and USERTOPICANSER. Try it with a simple example, assuming
> PostgreSQL:
>
> with
>
> topic (id) as (values (1),(2),(3)),
>
> topic_lang (id, lang) as (values (1, 'en'), (1, 'de'), (2, 'en'), (3, 'de'
> )),
>
> topic_user (id, name) as (values (1, 'Jon'), (1, 'Doe'), (2, 'Jon'), (2,
> 'Doe'))
>
> select *
>
> from topic as t
>
> join topic_lang as tl on t.id = tl.id
>
> left join topic_user as tu on t.id = tu.id
>
> This results in:
>
> |id |id |lang|id |name|
> |---|---|----|---|----|
> |1 |1 |de |1 |Jon |
> |1 |1 |en |1 |Jon |
> |1 |1 |de |1 |Doe |
> |1 |1 |en |1 |Doe |
> |2 |2 |en |2 |Jon |
> |2 |2 |en |2 |Doe |
> |3 |3 |de | | |
>
> Can you see the cartesian product for topic 1? There are a total of 2x2
> rows. 2 for the languages x 2 for the users. This can result in a ton of
> rows, which are all unnecessary, because you're correlating each of these
> relations again in multiset derived tables *on each row*!
>
> I hope this helps,
> Lukas
>
--
You received this message because you are subscribed to the Google Groups "jOOQ
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/jooq-user/1d216dc1-bae1-44bd-81c5-b2609b635c9dn%40googlegroups.com.