Hello guys,
jOOQ provides great power, and with great power comes great responsibility
:)
I happily implemented a Repository Abstraction for my Team to use, but it
currently shows it's limits that result in bad performance, which is mainly
resulting from my inability to understand specific performance
implications, when writing SQL for my target database (mariadb).
I don't have a specific question but seek a bit for a guide / cookbook for
writing performant SQL (via jOOQ DSL) for the business use case of a
Repository with remote-paginated query-functionality (filter, sort, keyset
pagination, ...).
My initial abstraction expected the users to write everything in one big
query, that uses multisets and joins. The joins are needed for filtering
etc, and the multisets to get the complex results. For example:
*Repository Query Abstraction - Part 1 (Performance Problems)*
@Dependent public class TopicRepository extends AbstractRepository<TopicDTO,
UUID> { @Inject TopicMapper topicMapper; @Inject TopicLangMapper
topicLangMapper; @Inject TopicTagLangMapper topicTagLangMapper; @Inject
LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper; @Inject
UserTopicAnswerMapper userTopicAnswerMapper; @Inject
UserTopicAnswerLangMapper userTopicAnswerLangMapper; private boolean
onlyUntagged = false; // Default constructor for CDI public TopicRepository
() { super(TOPIC.TOPICID); } public TopicRepository onlyUntagged(boolean
onlyUntagged) { this.onlyUntagged = onlyUntagged; return this; } @Override
protected SelectFinalStep<Record1<TopicDTO>> prepareQuery(XQuery query)
throws InvalidDataException { final QueryJooqMapper queryJooqMapper = new
QueryJooqMapper(query, TOPIC) .addMappableFields(TOPIC)
.addMappableFields(TOPICLANG) .addMappableFields(TOPICLABEL)
.addMappableFields(LABEL) .addMappableFields(LABELLANG)
.addMappableFields(USER) .addMappableFields(TOPICTAGLANG)
.addMappableFields(USERTOPICANSWER)
.addMappableFields(USERTOPICANSWERLANG); UUID userId = null; if (this.request()
instanceof UserRequestContext) { userId = ((UserRequestContext)
this.request()).getUserId();
} final Condition onlyUntaggedCondition = TOPIC.TAGSGENERATEDAT.isNull();
return dsl() .select( row( TOPIC, multiset( select( LABEL, multiset(
select(LABELLANG) .from(LABELLANG)
.where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec ->
labelLangMapper.map(rec))) ).from(TOPICLABEL)
.join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID))
.where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLabels")
.convertFrom(r -> r.map(rec -> labelMapper.map(rec))), multiset(
select(TOPICTAGLANG) .from(TOPICTAGLANG)
.where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetTags")
.convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))), multiset(
select(TOPICLANG) .from(TOPICLANG)
.where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLangs")
.convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))), row(
USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG)
.from(USERTOPICANSWERLANG)
.where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID))
).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec)))
).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r)) ).convertFrom(rec ->
topicMapper.map(rec)) ) .from(TOPIC)
.join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
.leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId)))
.where(queryJooqMapper.getFilters()) .and(onlyUntagged ?
onlyUntaggedCondition : DSL.trueCondition()) .groupBy(TOPIC.TOPICID)
.orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset())
.limit(queryJooqMapper.getLimit()); } }
After testing a bit, i found that following two points are mainly resulting
in performance problems:
- many unbound joins (meaning that a join is made without further filtering
on the joined table)
- many multisets that mariadb does not seem to apply at the very end on the
already filtered results (unclear)
*Repository Query Abstraction - Part 2 (Splitting the Repository in Two)*
I thought about how to solve this, and it seems that a split in Two results
in a overall better splitting that my development team could work with
without needing to be an expert. For example:
First Part: Only execute query to get a list of Ids (Page)
@Dependent @Data @Accessors(chain = true) @EqualsAndHashCode(callSuper =
true) public class TopicSelector extends AbstractSelector<TopicRecord,
UUID> { boolean onlyUntagged; UUID userId; /** * Create a selector for
topic. * * @param dsl dsl * @param query query * @param onlyUntagged
onlyUntagged * @param userId userId * @return selector */ public static
TopicSelector create(DSLContext dsl, XQuery query, boolean onlyUntagged,
UUID userId) { return SelectorFactory.create(TopicSelector.class, dsl,
query, TOPIC, TOPIC.TOPICID, DOR.getTables())
.setOnlyUntagged(onlyUntagged) .setUserId(userId); } @Override protected
Map<String, Field<?>> configQueryFields() { return Map.ofEntries( entry(
"title", TOPICLANG.TITLE), entry("langCode", TOPICLANG.LANGCODE), entry(
"labels", TOPICLABEL.LABELID), entry("answered", USERTOPICANSWER.ANSWERID)
); } @Override protected List<Join> configJoins() { return List.of(
join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)),
leftJoin(TOPICLABEL).on(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)),
leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID)
.and(USERTOPICANSWER.USERID.eq(userId))) ); } @Override protected
List<Condition> configConditions() { return List.of(onlyUntagged ?
TOPIC.TAGSGENERATEDAT.isNull() : DSL.trueCondition()); } }
The result of this Selector will be a List of Ids of the Main-Table that
the Selector works on (here: TOPIC), filtered and sorted by a given "query"
object that contains query-fields which are mapped to the corresponding
fields (TOPICLANG.TITLE, TOPICLANG.LANGCODE, ...), and are applied on the
configured joins with the configured additional conditions.
Second Part: Getting hydrated/extended DTOs for the given List of Ids
(Result of Part1)
@Dependent @Data @Accessors(chain = true) @EqualsAndHashCode(callSuper =
true) public class TopicRepository extends AbstractRepositoryV2<TopicDTO,
TopicRecord, UUID> { @Inject TopicLabelListLoader topicLabelListLoader;
@Inject TopicLangListLoader topicLangListLoader; @Inject
TopicTagLangListLoader topicTagLangListLoader; @Inject
UserTopicAnswerLoader userTopicAnswerLoader; /** * Create a repository for
topic. * * @param dsl dsl * @return repository */ public static
TopicRepository create(DSLContext dsl) { return
RepositoryV2Factory.create(TopicRepository.class, dsl, TOPIC,
TOPIC.TOPICID, TopicMapper.class); } @Override protected List<TopicDTO>
expandResult(List<TopicDTO> results, List<UUID> ids) { var topicLabels =
topicLabelListLoader.load(dsl, ids); var topicTagLangs =
topicTagLangListLoader.load(dsl, ids); var topicLangs =
topicLangListLoader.load(dsl, ids); var topicAnswers =
userTopicAnswerLoader.load(dsl, ids, this.userId); return
results.stream().map(topic -> topic .setLabels(list(topicLabels,
topic.getTopicId())) .setTagLangs(list(topicTagLangs, topic.getTopicId()))
.setLangs(list(topicLangs, topic.getTopicId()))
.setAnswer(one(topicAnswers, topic.getTopicId()))).toList(); } }
@ApplicationScoped public class TopicLabelListLoader implements
ListLoader<UUID,
LabelDTO> { @Inject LabelMapper labelMapper; @Inject LabelLangMapper
labelLangMapper; @Override public Map<UUID, List<LabelDTO>> load(DSLContext
dsl, List<UUID> ids) { return dsl.select( LABEL.LABELID, LABEL, multiset(
select(LABELLANG) .from(LABELLANG)
.where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec ->
labelLangMapper.map(rec))) ).from(TOPICLABEL)
.join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID))
.where(TOPICLABEL.TOPICID.in(ids)) .fetchGroups(LABEL.LABELID,
labelMapper::map); } } @ApplicationScoped public class TopicLangListLoader
implements ListLoader<UUID, TopicLangDTO> { @Inject TopicLangMapper
topicLangMapper; @Override public Map<UUID, List<TopicLangDTO>> load(DSLContext
dsl, List<UUID> ids) { return dsl.select(TOPICLANG) .from(TOPICLANG)
.where(TOPICLANG.TOPICID.in(ids)) .fetchGroups(TOPICLANG.TOPICID,
topicLangMapper::map); } } @ApplicationScoped public class
TopicTagLangListLoader implements ListLoader<UUID, TopicTagLangDTO> {
@Inject TopicTagLangMapper topicTagLangMapper; @Override public Map<UUID,
List<TopicTagLangDTO>> load(DSLContext dsl, List<UUID> ids) { return
dsl.select(TOPICTAGLANG) .from(TOPICTAGLANG)
.where(TOPICTAGLANG.TOPICID.in(ids)) .fetchGroups(TOPICTAGLANG.TOPICID,
topicTagLangMapper::map); } } @ApplicationScoped public class
UserTopicAnswerLoader { @Inject UserTopicAnswerMapper mapper; @Inject
UserTopicAnswerLangMapper userTopicAnswerLangMapper; public Map<UUID,
UserTopicAnswerDTO> load(DSLContext dsl, List<UUID> ids, UUID userId) {
return dsl.select( USERTOPICANSWER.TOPICID, USERTOPICANSWER, multiset(
select(USERTOPICANSWERLANG) .from(USERTOPICANSWERLANG)
.where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID))
).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec)))
).from(USERTOPICANSWER) .where(USERTOPICANSWER.TOPICID.in(ids))
.and(USERTOPICANSWER.USERID.eq(userId)) .fetchMap(USERTOPICANSWER.TOPICID,
mapper::mapOrNull); } }
The Repository now is only responsible for the Fetching of nested DTOs by
the given list of Ids that the Selector has found. Each multiset has been
split into a separate call to the database via a separate "Loader"-class
that encapsulates it for reusability.
As you guys can see, while Part1 had it's performance problems it was
really nice that everything was in one big code-block and everything! the
filtering, sorting and result-generation was outsourced to the database,
and only one call to the database was necessary to execute it.
Part2 now is much more focused on splitting the code into separate
fragments and needs to interact more times with the database. First:
Loading Ids from the database, Second: Calling the database for each
involved multiset seperately with additional calls.
Just wanted to share this story of my architecture roundtrips in the group.
Maybe someone has an idea or a tip if there is something i have completely
missed.
But yeah, i love that jOOQ provides the ability to build own abstractions,
but it is often not easy to forsee if an architecture that you write on
your own is performance-safe when the number of joins and multisets are
growing over time. The example i presented here would be no problem. But in
my job we join much more tables, and also need much more results. I also
thought about trying out Views but somehow never gave it a chance until now.
--
You received this message because you are subscribed to the Google Groups "jOOQ
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/jooq-user/9aa2723e-96a4-4317-adfd-21d3e306ec42n%40googlegroups.com.