Why are JOIN/correlation predicates not pushed down as IN lists across JDBC datasources?

Gavin Ray Fri, 16 May 2025 06:53:09 -0700

Suppose I have two Calcite schemas from JDBC sources.

One contains `users` and the other `user_todos`.


If I run a query like:

    SELECT u.name, t.todo
    FROM USERSDB.users u
    LEFT JOIN TODOSDB.user_todos t ON u.id = t.user_id
    GROUP BY u.id, u.name, t.todo

What seems like the most logical thing to do would be to run:

    SELECT u.name
    FROM USERSDB.users u

And then consecutively:

    SELECT t.todo
    FROM TODOSDB.user_todos
    WHERE t.user_id IN (...)

But what happens is that the entire `user_todos` is scanned:

    /**/stat2.execute("SELECT \"USER_ID\", \"TODO\"\nFROM
\"USER_TODOS\"\nORDER BY \"USER_ID\" NULLS LAST");
    2025-05-16 09:17:54.191653-04:00 jdbc[10]: Plan       : calculate cost
for plan [PUBLIC.USER_TODOS]
    2025-05-16 09:17:54.191766-04:00 jdbc[10]: Plan       :   for table
filter PUBLIC.USER_TODOS
    2025-05-16 09:17:54.192225-04:00 jdbc[10]: Table      :     potential
plan item cost 10,230 index PUBLIC.USER_TODOS.tableScan
    2025-05-16 09:17:54.192361-04:00 jdbc[10]: Table      :     potential
plan item cost 12,240 index PUBLIC.PRIMARY_KEY_C
    2025-05-16 09:17:54.192465-04:00 jdbc[10]: Plan       :   best plan
item cost 10,230 index PUBLIC.USER_TODOS.tableScan

Is this intentional?
Why not push down JOIN predicates or correlative predicates to the JDBC
source?

Why are JOIN/correlation predicates not pushed down as IN lists across JDBC datasources?

Reply via email to