Hi Thorsten,

Thanks for your message. This is a tricky question. Ideally, you should run
a near-production style benchmark comparing the alternatives for 1) your
hardware, 2) your RDBMS versions, 3) your data set.

The benchmark technique I usually recommend is this simple approach here,
where you can use two pgplsql loops doing the same logical work with
different approaches:
https://www.jooq.org/benchmark

When it comes to comparing IN lists with ANY(?) using array binds, I've
benchmarked this for Oracle and PostgreSQL specifically, finding that
starting with ~64 binds, the = ANY(?) array parameter starts to outperform
the IN list, on my hardware, my RDBMS version, and my data set:
https://blog.jooq.org/2017/03/30/sql-in-predicate-with-in-list-or-with-array-which-is-faster/


jOOQ supports either syntax, so you can write a jOOQ based benchmark as
well if you like.

Answering your different points:

On Mon, Aug 31, 2020 at 9:45 AM Thorsten Schöning <[email protected]>
wrote:

> Additionally I'm mostly embedding the
> IDs as ","-seperated list into the query directly, e.g. because I
> already hit a limitation of ~32k parameters of the JDBC-driver[1] for
> Postgres.
>

jOOQ should protect you from that limit:
https://www.jooq.org/doc/3.14/manual/sql-building/dsl-context/custom-settings/settings-inline-threshold/


But of course, 32k parameters is mostly a problem for any JDBC driver
and/or server, so it's not a bad idea to avoid this situation. In
PostgreSQL's particular case, you don't profit from using bind variables in
such cases, because PostgreSQL's execution plan cache is very simple,
compared to e.g. Oracle's or SQL Server's, so you might as well inline all
those values, if you're not going to use any of the other options (see
below)


> Do you know of any obvious problem in jOOQ rendering such queries?


Yes. Regrettably, large IN lists are always a problem with almost all
RDBMS. I've seen Oracle production databases squeal because of cursor cache
contention issues (something you won't run into in PostgreSQL), because of
this.


> Is there some reasonable size limit for queries in terms of plain textual
> size to render or number of values or alike?


You need to find out the "reasonable" thresholds empirically.


> Is it even likely at all that jOOQ will become a problem e.g. compared to
> what the JDBC-driver
> does with the query itself already?
>

I don't think there's a problem in jOOQ here. The problems are in the JDBC
driver (or the wire protocol, more specifically, the driver itself
shouldn't have a problem), and in the server.

Note there are a few options that most people overlook in these cases:

   1. Ideally, you shouldn't have large IN lists, but reproduce the query
   that produced all those IDs in the first place, and semi-join that. E.g.
   SELECT * FROM t WHERE id IN (SELECT id FROM ...), where the subquery is
   some previous query
   2. Even better, change the business logic itself. Do you really need the
   large IN list? In my experience, these lists appear when users are
   presented with long lists of checkboxes, and the only reason why anyone
   would get such a long list is because the user selected "check all". In
   case of which case 1) applies. Perhaps the feature can be avoided entirely?
   3. In some cases, a temporary table containing all these IDs can help
   (do benchmark). You'd have to batch-insert all the IDs to that table, and
   then again semi-join it like SELECT * FROM t WHERE id IN (SELECT id FROM
   temp_table)
   4. Again, the array bind variable SELECT * FROM t WHERE id = ANY(?) is a
   reasonable choice in PostgreSQL

Cheers,
Lukas

-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jooq-user/CAB4ELO61fPL7DZvowoOZtgoT1txYLJvLCvJTdAgo9M%2BiEbS%3DmQ%40mail.gmail.com.

Reply via email to