[jira] [Commented] (CALCITE-6216) The JDBC adapter is unable to push down queries when a query involves multiple data sources

Julian Hyde (Jira) Tue, 23 Jan 2024 18:17:05 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810164#comment-17810164
 ]


Julian Hyde commented on CALCITE-6216:
--------------------------------------

As you note, JDBC rules are unique in that they have different instances for 
each backend database (custom schema). We did that deliberately, to prevent 
impossible joins from happening.

I don't think that two custom schemas should be regarded as equivalent just 
because their credentials are identical. (It's analogous to [name equivalence 
vs structural 
equivalence|https://stackoverflow.com/questions/4486401/structural-equivalence-vs-name-equivalence]
 in type systems.)

But would be reasonable to have *some way* to declare that two custom schemas 
can be addressed from the same JDBC connection (and which prefixes to add to 
table names when generating SQL for tables in those schemas), so that joins are 
possible. Would anyone care to propose JSON syntax for that?

> The JDBC adapter is unable to push down queries when  a query involves 
> multiple data sources
> --------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6216
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6216
>             Project: Calcite
>          Issue Type: Improvement
>          Components: jdbc-adapter
>            Reporter: xjlem
>            Priority: Major
>
> I want to know why the description of JdbcToEnumerableConverterRule in the 
> JDBC rule includes "JdbcConvention out" information while others do not, 
> which leads to different data sources sharing the same JDBC rule.
> for example：
> config:
> {code:java}
> {
>   "defaultSchema": "db1",
>   "schemas": [
>     {
>       "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
>       "name": "db1",
>       "operand": {
>         "jdbcDriver": "com.mysql.jdbc.Driver",
>         "jdbcPassword": "",
>         "jdbcUrl": "",
>         "jdbcUser": ""
>       },
>       "type": "custom"
>     },
>     {
>       "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
>       "name": "db2",
>       "operand": {
>         "jdbcDriver": "com.mysql.jdbc.Driver",
>         "jdbcPassword": "",
>         "jdbcUrl": "",
>         "jdbcUser": ""
>       },
>       "type": "custom"
>     }
>   ],
>   "version": "1.0"
> } {code}
> sql：
> {code:java}
> SELECT pv / sales, pv, sales
> FROM (
>     SELECT sku_id, COUNT(*) AS sales
>     FROM db1.sku_sales
>     WHERE SKU_ID = 1
>     GROUP BY sku_id
> ) sku_sales
>     FULL JOIN (
>         SELECT sku_id, COUNT(pv) AS pv
>         FROM db2.sku_flow
>         WHERE SKU_ID = 1
>         GROUP BY sku_id
>     ) sku_flow
>     ON sku_sales.sku_id = sku_flow.sku_id{code}
> it will convert to logic relNode:
> {code:java}
> LogicalProject(EXPR$0=[/($3, $1)], pv=[$3], sales=[$1])
>   LogicalJoin(condition=[=($0, $2)], joinType=[full])
>     LogicalAggregate(group=[{0}], sales=[COUNT()])
>       LogicalFilter(condition=[=($0, 1)])
>         LogicalProject(sku_id=[$2])
>           JdbcTableScan(table=[[db1, sku_sales]])
>     LogicalAggregate(group=[{0}], pv=[COUNT($1)])
>       LogicalFilter(condition=[=($0, 1)])
>         LogicalProject(sku_id=[$2], pv=[$3])
>           JdbcTableScan(table=[[db2, sku_flow]]) {code}
> optimize:
> When registering the rule, the rule for the DB2 convention will not be 
> registered because the jdbcRule has the same description as the rule for the 
> DB1 convention, causing the addRule method to return false. 
> org.apache.calcite.adapter.jdbc.JdbcConvention#register
> {code:java}
> @Override public void register(RelOptPlanner planner) {
>   for (RelOptRule rule : JdbcRules.rules(this)) {
>     planner.addRule(rule);
>   }
>   planner.addRule(FilterSetOpTransposeRule.INSTANCE);
>   planner.addRule(ProjectRemoveRule.INSTANCE);
> } {code}
> after optimize:
> {code:java}
> EnumerableCalc(expr#0..3=[{inputs}], expr#4=[/($t3, $t1)], EXPR$0=[$t4], 
> pv=[$t3], sales=[$t1])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[full])
>     JdbcToEnumerableConverter
>       JdbcAggregate(group=[{2}], sales=[COUNT()])
>         JdbcFilter(condition=[=($2, 1)])
>           JdbcTableScan(table=[[db1, sku_sales]])
>     EnumerableAggregate(group=[{2}], pv=[COUNT($3)])
>       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
> proj#0..4=[{exprs}], $condition=[$t6])
>         JdbcToEnumerableConverter
>           JdbcTableScan(table=[[db2, sku_flow]])
>  {code}
> As a result, we can see that the push down for DB2 fails.
>    



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6216) The JDBC adapter is unable to push down queries when a query involves multiple data sources

Reply via email to