Hello guys, 

jOOQ provides great power, and with great power comes great responsibility 
:)

I happily implemented a Repository Abstraction for my Team to use,  but it 
currently shows it's limits that result in bad performance, which is mainly 
resulting from my inability to understand specific performance 
implications, when writing SQL for my target database (mariadb).

I don't have a specific question but seek a bit for a guide / cookbook for 
writing performant SQL (via jOOQ DSL) for the business use case of a 
Repository with remote-paginated query-functionality (filter, sort, keyset 
pagination, ...).

My initial abstraction expected the users to write everything in one big 
query, that uses multisets and joins. The joins are needed for filtering 
etc, and the multisets to get the complex results. For example:


*Repository Query Abstraction - Part 1 (Performance Problems)*
@Dependent public class TopicRepository extends AbstractRepository<TopicDTO, 
UUID> { @Inject TopicMapper topicMapper; @Inject TopicLangMapper 
topicLangMapper; @Inject TopicTagLangMapper topicTagLangMapper; @Inject 
LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper; @Inject 
UserTopicAnswerMapper userTopicAnswerMapper; @Inject 
UserTopicAnswerLangMapper userTopicAnswerLangMapper; private boolean 
onlyUntagged = false; // Default constructor for CDI public TopicRepository
() { super(TOPIC.TOPICID); } public TopicRepository onlyUntagged(boolean 
onlyUntagged) { this.onlyUntagged = onlyUntagged; return this; } @Override 
protected SelectFinalStep<Record1<TopicDTO>> prepareQuery(XQuery query) 
throws InvalidDataException { final QueryJooqMapper queryJooqMapper = new 
QueryJooqMapper(query, TOPIC) .addMappableFields(TOPIC) 
.addMappableFields(TOPICLANG) .addMappableFields(TOPICLABEL) 
.addMappableFields(LABEL) .addMappableFields(LABELLANG) 
.addMappableFields(USER) .addMappableFields(TOPICTAGLANG) 
.addMappableFields(USERTOPICANSWER) 
.addMappableFields(USERTOPICANSWERLANG); UUID userId = null; if (this.request() 
instanceof UserRequestContext) { userId = ((UserRequestContext) 
this.request()).getUserId(); 
} final Condition onlyUntaggedCondition = TOPIC.TAGSGENERATEDAT.isNull(); 
return dsl() .select( row( TOPIC, multiset( select( LABEL, multiset( 
select(LABELLANG) .from(LABELLANG) 
.where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec -> 
labelLangMapper.map(rec))) ).from(TOPICLABEL) 
.join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID)) 
.where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLabels") 
.convertFrom(r -> r.map(rec -> labelMapper.map(rec))), multiset( 
select(TOPICTAGLANG) .from(TOPICTAGLANG) 
.where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetTags") 
.convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))), multiset( 
select(TOPICLANG) .from(TOPICLANG) 
.where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLangs") 
.convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))), row( 
USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG) 
.from(USERTOPICANSWERLANG) 
.where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID)) 
).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec))) 
).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r)) ).convertFrom(rec -> 
topicMapper.map(rec)) ) .from(TOPIC) 
.join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) 
.leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId)))
 
.where(queryJooqMapper.getFilters()) .and(onlyUntagged ? 
onlyUntaggedCondition : DSL.trueCondition()) .groupBy(TOPIC.TOPICID) 
.orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset()) 
.limit(queryJooqMapper.getLimit()); } } 
​

After testing a bit, i found that following two points are mainly resulting 
in performance problems:
- many unbound joins (meaning that a join is made without further filtering 
on the joined table)
- many multisets that mariadb does not seem to apply at the very end on the 
already filtered results (unclear)


*Repository Query Abstraction - Part 2 (Splitting the Repository in Two)*
I thought about how to solve this, and it seems that a split in Two results 
in a overall better splitting that my development team could work with 
without needing to be an expert. For example:

First Part: Only execute query to get a list of Ids (Page) 
@Dependent @Data @Accessors(chain = true) @EqualsAndHashCode(callSuper = 
true) public class TopicSelector extends AbstractSelector<TopicRecord, 
UUID> { boolean onlyUntagged; UUID userId; /** * Create a selector for 
topic. * * @param dsl dsl * @param query query * @param onlyUntagged 
onlyUntagged * @param userId userId * @return selector */ public static 
TopicSelector create(DSLContext dsl, XQuery query, boolean onlyUntagged, 
UUID userId) { return SelectorFactory.create(TopicSelector.class, dsl, 
query, TOPIC, TOPIC.TOPICID, DOR.getTables()) 
.setOnlyUntagged(onlyUntagged) .setUserId(userId); } @Override protected 
Map<String, Field<?>> configQueryFields() { return Map.ofEntries( entry(
"title", TOPICLANG.TITLE), entry("langCode", TOPICLANG.LANGCODE), entry(
"labels", TOPICLABEL.LABELID), entry("answered", USERTOPICANSWER.ANSWERID) 
); } @Override protected List<Join> configJoins() { return List.of( 
join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)), 
leftJoin(TOPICLABEL).on(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)), 
leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID) 
.and(USERTOPICANSWER.USERID.eq(userId))) ); } @Override protected 
List<Condition> configConditions() { return List.of(onlyUntagged ? 
TOPIC.TAGSGENERATEDAT.isNull() : DSL.trueCondition()); } } 
​

The result of this Selector will be a List of Ids of the Main-Table that 
the Selector works on (here: TOPIC), filtered and sorted by a given "query" 
object that contains query-fields which are mapped to the corresponding 
fields (TOPICLANG.TITLE, TOPICLANG.LANGCODE, ...), and are applied on the 
configured joins with the configured additional conditions.

Second Part: Getting hydrated/extended DTOs for the given List of Ids 
(Result of Part1)
@Dependent @Data @Accessors(chain = true) @EqualsAndHashCode(callSuper = 
true) public class TopicRepository extends AbstractRepositoryV2<TopicDTO, 
TopicRecord, UUID> { @Inject TopicLabelListLoader topicLabelListLoader; 
@Inject TopicLangListLoader topicLangListLoader; @Inject 
TopicTagLangListLoader topicTagLangListLoader; @Inject 
UserTopicAnswerLoader userTopicAnswerLoader; /** * Create a repository for 
topic. * * @param dsl dsl * @return repository */ public static 
TopicRepository create(DSLContext dsl) { return 
RepositoryV2Factory.create(TopicRepository.class, dsl, TOPIC, 
TOPIC.TOPICID, TopicMapper.class); } @Override protected List<TopicDTO> 
expandResult(List<TopicDTO> results, List<UUID> ids) { var topicLabels = 
topicLabelListLoader.load(dsl, ids); var topicTagLangs = 
topicTagLangListLoader.load(dsl, ids); var topicLangs = 
topicLangListLoader.load(dsl, ids); var topicAnswers = 
userTopicAnswerLoader.load(dsl, ids, this.userId); return 
results.stream().map(topic -> topic .setLabels(list(topicLabels, 
topic.getTopicId())) .setTagLangs(list(topicTagLangs, topic.getTopicId())) 
.setLangs(list(topicLangs, topic.getTopicId())) 
.setAnswer(one(topicAnswers, topic.getTopicId()))).toList(); } } 
@ApplicationScoped public class TopicLabelListLoader implements 
ListLoader<UUID, 
LabelDTO> { @Inject LabelMapper labelMapper; @Inject LabelLangMapper 
labelLangMapper; @Override public Map<UUID, List<LabelDTO>> load(DSLContext 
dsl, List<UUID> ids) { return dsl.select( LABEL.LABELID, LABEL, multiset( 
select(LABELLANG) .from(LABELLANG) 
.where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec -> 
labelLangMapper.map(rec))) ).from(TOPICLABEL) 
.join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID)) 
.where(TOPICLABEL.TOPICID.in(ids)) .fetchGroups(LABEL.LABELID, 
labelMapper::map); } } @ApplicationScoped public class TopicLangListLoader 
implements ListLoader<UUID, TopicLangDTO> { @Inject TopicLangMapper 
topicLangMapper; @Override public Map<UUID, List<TopicLangDTO>> load(DSLContext 
dsl, List<UUID> ids) { return dsl.select(TOPICLANG) .from(TOPICLANG) 
.where(TOPICLANG.TOPICID.in(ids)) .fetchGroups(TOPICLANG.TOPICID, 
topicLangMapper::map); } } @ApplicationScoped public class 
TopicTagLangListLoader implements ListLoader<UUID, TopicTagLangDTO> { 
@Inject TopicTagLangMapper topicTagLangMapper; @Override public Map<UUID, 
List<TopicTagLangDTO>> load(DSLContext dsl, List<UUID> ids) { return 
dsl.select(TOPICTAGLANG) .from(TOPICTAGLANG) 
.where(TOPICTAGLANG.TOPICID.in(ids)) .fetchGroups(TOPICTAGLANG.TOPICID, 
topicTagLangMapper::map); } } @ApplicationScoped public class 
UserTopicAnswerLoader { @Inject UserTopicAnswerMapper mapper; @Inject 
UserTopicAnswerLangMapper userTopicAnswerLangMapper; public Map<UUID, 
UserTopicAnswerDTO> load(DSLContext dsl, List<UUID> ids, UUID userId) { 
return dsl.select( USERTOPICANSWER.TOPICID, USERTOPICANSWER, multiset( 
select(USERTOPICANSWERLANG) .from(USERTOPICANSWERLANG) 
.where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID)) 
).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec))) 
).from(USERTOPICANSWER) .where(USERTOPICANSWER.TOPICID.in(ids)) 
.and(USERTOPICANSWER.USERID.eq(userId)) .fetchMap(USERTOPICANSWER.TOPICID, 
mapper::mapOrNull); } } 
​

The Repository now is only responsible for the Fetching of nested DTOs by 
the given list of Ids that the Selector has found. Each multiset has been 
split into a separate call to the database via a separate "Loader"-class 
that encapsulates it for reusability.

As you guys can see, while Part1 had it's performance problems it was 
really nice that everything was in one big code-block and everything! the 
filtering, sorting and result-generation was outsourced to the database, 
and only one call to the database was necessary to execute it.

Part2 now is much more focused on splitting the code into separate 
fragments and needs to interact more times with the database. First: 
Loading Ids from the database, Second: Calling the database for each 
involved multiset seperately with additional calls.

Just wanted to share this story of my architecture roundtrips in the group. 
Maybe someone has an idea or a tip if there is something i have completely 
missed.

But yeah, i love that jOOQ provides the ability to build own abstractions, 
but it is often not easy to forsee if an architecture that you write on 
your own is performance-safe when the number of joins and multisets are 
growing over time. The example i presented here would be no problem. But in 
my job we join much more tables, and also need much more results. I also 
thought about trying out Views but somehow never gave it a chance until now.

-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jooq-user/9aa2723e-96a4-4317-adfd-21d3e306ec42n%40googlegroups.com.

Reply via email to