Re: Optimizing selectFrom(x).fetch(y) queries

Lukas Eder Fri, 10 Mar 2017 11:37:58 -0800

2017-03-10 19:04 GMT+01:00 <[email protected]>:

> Yes, I figured that is why you didn't implement this already. I was hoping
> that there were a set of heuristics that could identify if the optimization
> was possible. If the cases you mentioned were detected in the AST, then it
> would disable the reduction. But that could be error prone as perhaps too
> subtle.
>


There would certainly be room for heuristics. Just like any SQL optimiser
"heuristically" performs some SQL transformations (such as predicate push
down) without even looking at the cost model.

But it's a huge topic and if we start doing this, we'll have to go "all in"
on automatic SQL transformations. It would no longer be the same jOOQ. The
jOOQ where the SQL being generated is exactly the SQL the user intended to
be generated.

Besides, the complexity of getting this right for all 21 RDBMS (as I said,
derived tables are costly in some databases, but not in others) is very
big. So, I'd rather not start this project (yet).

Especially given the fact that this could probably be improved by improving
the API design, only, making it harder for users to make such "mistakes".

Nonetheless, I've recently had a very interesting discussion with a
participant of my SQL training who asked me why there's no tool (like an
IDE) that allows SQL developers to explore a set of alternative, equivalent
SQL queries to the ones they are writing, comparing potential costs, and
helping them refactor things. Imagine unnesting a nested select, or
transforming an inner join to a semi join, or a correlated subquery to a
derived table, etc. That might be a quite interesting feature for jOOQ, and
in fact, I might go in that direction once the new org.jooq.Parser API is
more stable - because then, jOOQ could do that also for string-based SQL
queries!

Alternatively it seems like a good candidate for a static analysis rule,
> like PMD. That would be to detect and hint that it may be requesting too
> much data and recommend a reduced query or suppressing the warning.
>

I think it would be hard to perform static analysis here, but perhaps
runtime analysis. I've been thinking about a jOOQ/SQL lint module that can
be turned on, which collects such statistics and logs some hints /
warnings. In this case, we might be able to collect statistics about
queries and record usage to see if there are situations where columns are
selected but never used in the client.

Of course, this too is quite a project - but it might be very interesting -
because it would be much more generally applicable than just optimising the
relatively specific fetch(Field) method. We would be able to find out that
2 out of 4 columns of a query like this one aren't needed:

Result<?> result = ctx.select(T.A, T.B, T.C, T.D).from(T).fetch();
...

result.getValues(T.A);

result.getValues(T.B);
...

// Result goes out of scope. Only A and B were ever consumed.


I've had this idea a couple of times now, but never wrote it down. Here's
the relevant feature request now, to collect more ideas:
https://github.com/jOOQ/jOOQ/issues/5960

There might not be any good solutions in the library, but its a common
> oversight for simple queries. It would be nice to encourage users to do
> better, or try to capture their intent. In the end of the day the slow
> query log is still the best (and ignored) tool to fix these.
>

The problem (as I mentioned in my article) is the fact that a query isn't
going to be really slow just because of SELECT *. It may be 3x slower, but
going from 0.1ms to 0.3ms isn't noticeable. The problem is that this kind
of constant overhead generates a lot of system load in total. It's more of
a throughput problem than a performance problem - the latter being much
easier to tune.

Cheers,
Lukas

-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Optimizing selectFrom(x).fetch(y) queries

Reply via email to