[jira] [Updated] (CALCITE-6216) The JDBC adapter is unable to push down queries when there is a join involved in the query.

xjlem (Jira) Mon, 22 Jan 2024 03:50:05 -0800


     [ 
https://issues.apache.org/jira/browse/CALCITE-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


xjlem updated CALCITE-6216:
---------------------------
    Description: 
I want to know why the description of JdbcToEnumerableConverterRule in the JDBC 
rule includes "JdbcConvention out" information while others do not, which leads 
to different data sources sharing the same JDBC rule.

for example：

config:
{code:java}
{
  "defaultSchema": "db1",
  "schemas": [
    {
      "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
      "name": "db1",
      "operand": {
        "jdbcDriver": "com.mysql.jdbc.Driver",
        "jdbcPassword": "",
        "jdbcUrl": "",
        "jdbcUser": ""
      },
      "type": "custom"
    },
    {
      "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
      "name": "db2",
      "operand": {
        "jdbcDriver": "com.mysql.jdbc.Driver",
        "jdbcPassword": "",
        "jdbcUrl": "",
        "jdbcUser": ""
      },
      "type": "custom"
    }
  ],
  "version": "1.0"
} {code}
sql：
{code:java}
SELECT pv / sales, pv, sales
FROM (
    SELECT sku_id, COUNT(*) AS sales
    FROM db1.sku_sales
    WHERE SKU_ID = 1
    GROUP BY sku_id
) sku_sales
    FULL JOIN (
        SELECT sku_id, COUNT(pv) AS pv
        FROM db2.sku_flow
        WHERE SKU_ID = 1
        GROUP BY sku_id
    ) sku_flow
    ON sku_sales.sku_id = sku_flow.sku_id{code}
it will convert to logic relNode:
{code:java}
LogicalProject(EXPR$0=[/($3, $1)], pv=[$3], sales=[$1])
  LogicalJoin(condition=[=($0, $2)], joinType=[full])
    LogicalAggregate(group=[{0}], sales=[COUNT()])
      LogicalFilter(condition=[=($0, 1)])
        LogicalProject(sku_id=[$2])
          JdbcTableScan(table=[[db1, sku_sales]])
    LogicalAggregate(group=[{0}], pv=[COUNT($1)])
      LogicalFilter(condition=[=($0, 1)])
        LogicalProject(sku_id=[$2], pv=[$3])
          JdbcTableScan(table=[[db2, sku_flow]]) {code}
optimize:

When registering the rule, the rule for the DB2 convention will not be 
registered because the jdbcRule has the same description as the rule for the 
DB1 convention, causing the addRule method to return false. 

org.apache.calcite.adapter.jdbc.JdbcConvention#register
{code:java}
@Override public void register(RelOptPlanner planner) {
  for (RelOptRule rule : JdbcRules.rules(this)) {
    planner.addRule(rule);
  }
  planner.addRule(FilterSetOpTransposeRule.INSTANCE);
  planner.addRule(ProjectRemoveRule.INSTANCE);
} {code}
after optimize:
{code:java}
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[/($t3, $t1)], EXPR$0=[$t4], 
pv=[$t3], sales=[$t1])
  EnumerableJoin(condition=[=($0, $2)], joinType=[full])
    JdbcToEnumerableConverter
      JdbcAggregate(group=[{2}], sales=[COUNT()])
        JdbcFilter(condition=[=($2, 1)])
          JdbcTableScan(table=[[db1, sku_sales]])
    EnumerableAggregate(group=[{2}], pv=[COUNT($3)])
      EnumerableCalc(expr#0..4=[{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
proj#0..4=[{exprs}], $condition=[$t6])
        JdbcToEnumerableConverter
          JdbcTableScan(table=[[db2, sku_flow]])
 {code}
As a result, we can see that the push down for DB2 fails.

   

  was:
I want to know why the description of JdbcToEnumerableConverterRule in the JDBC 
rule includes "JdbcConvention out" information while others do not, which leads 
to different data sources sharing the same JDBC rule.

for example：

config:
{code:java}
{
  "defaultSchema": "db1",
  "schemas": [
    {
      "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
      "name": "db1",
      "operand": {
        "jdbcDriver": "com.mysql.jdbc.Driver",
        "jdbcPassword": "",
        "jdbcUrl": "",
        "jdbcUser": ""
      },
      "type": "custom"
    },
    {
      "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
      "name": "db2",
      "operand": {
        "jdbcDriver": "com.mysql.jdbc.Driver",
        "jdbcPassword": "",
        "jdbcUrl": "",
        "jdbcUser": ""
      },
      "type": "custom"
    }
  ],
  "version": "1.0"
} {code}
sql：
{code:java}
SELECT pv / sales, pv, sales
FROM (
    SELECT sku_id, COUNT(*) AS sales
    FROM db1.sku_sales
    WHERE SKU_ID = 1
    GROUP BY sku_id
) sku_sales
    FULL JOIN (
        SELECT sku_id, COUNT(pv) AS pv
        FROM db2.sku_flow
        WHERE SKU_ID = 1
        GROUP BY sku_id
    ) sku_flow
    ON sku_sales.sku_id = sku_flow.sku_id{code}
it will convert to logic relNode:
{code:java}
LogicalProject(EXPR$0=[/($3, $1)], pv=[$3], sales=[$1])
  LogicalJoin(condition=[=($0, $2)], joinType=[full])
    LogicalAggregate(group=[{0}], sales=[COUNT()])
      LogicalFilter(condition=[=($0, 1)])
        LogicalProject(sku_id=[$2])
          JdbcTableScan(table=[[db1, sku_sales]])
    LogicalAggregate(group=[{0}], pv=[COUNT($1)])
      LogicalFilter(condition=[=($0, 1)])
        LogicalProject(sku_id=[$2], pv=[$3])
          JdbcTableScan(table=[[db2, sku_flow]]) {code}
optimize:

when register rule the db2 convention's rule will not regist beacuse jdbcRule 
have same description with db1 convention's rule  and addRule will return false 
.

org.apache.calcite.adapter.jdbc.JdbcConvention#register
{code:java}
@Override public void register(RelOptPlanner planner) {
  for (RelOptRule rule : JdbcRules.rules(this)) {
    planner.addRule(rule);
  }
  planner.addRule(FilterSetOpTransposeRule.INSTANCE);
  planner.addRule(ProjectRemoveRule.INSTANCE);
} {code}
after optimize:
{code:java}
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[/($t3, $t1)], EXPR$0=[$t4], 
pv=[$t3], sales=[$t1])
  EnumerableJoin(condition=[=($0, $2)], joinType=[full])
    JdbcToEnumerableConverter
      JdbcAggregate(group=[{2}], sales=[COUNT()])
        JdbcFilter(condition=[=($2, 1)])
          JdbcTableScan(table=[[db1, sku_sales]])
    EnumerableAggregate(group=[{2}], pv=[COUNT($3)])
      EnumerableCalc(expr#0..4=[{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
proj#0..4=[{exprs}], $condition=[$t6])
        JdbcToEnumerableConverter
          JdbcTableScan(table=[[db2, sku_flow]])
 {code}
as we can see the db2 push down fail


   


> The JDBC adapter is unable to push down queries when there is a join involved 
> in the query.
> -------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6216
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6216
>             Project: Calcite
>          Issue Type: Improvement
>          Components: jdbc-adapter
>            Reporter: xjlem
>            Priority: Major
>
> I want to know why the description of JdbcToEnumerableConverterRule in the 
> JDBC rule includes "JdbcConvention out" information while others do not, 
> which leads to different data sources sharing the same JDBC rule.
> for example：
> config:
> {code:java}
> {
>   "defaultSchema": "db1",
>   "schemas": [
>     {
>       "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
>       "name": "db1",
>       "operand": {
>         "jdbcDriver": "com.mysql.jdbc.Driver",
>         "jdbcPassword": "",
>         "jdbcUrl": "",
>         "jdbcUser": ""
>       },
>       "type": "custom"
>     },
>     {
>       "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
>       "name": "db2",
>       "operand": {
>         "jdbcDriver": "com.mysql.jdbc.Driver",
>         "jdbcPassword": "",
>         "jdbcUrl": "",
>         "jdbcUser": ""
>       },
>       "type": "custom"
>     }
>   ],
>   "version": "1.0"
> } {code}
> sql：
> {code:java}
> SELECT pv / sales, pv, sales
> FROM (
>     SELECT sku_id, COUNT(*) AS sales
>     FROM db1.sku_sales
>     WHERE SKU_ID = 1
>     GROUP BY sku_id
> ) sku_sales
>     FULL JOIN (
>         SELECT sku_id, COUNT(pv) AS pv
>         FROM db2.sku_flow
>         WHERE SKU_ID = 1
>         GROUP BY sku_id
>     ) sku_flow
>     ON sku_sales.sku_id = sku_flow.sku_id{code}
> it will convert to logic relNode:
> {code:java}
> LogicalProject(EXPR$0=[/($3, $1)], pv=[$3], sales=[$1])
>   LogicalJoin(condition=[=($0, $2)], joinType=[full])
>     LogicalAggregate(group=[{0}], sales=[COUNT()])
>       LogicalFilter(condition=[=($0, 1)])
>         LogicalProject(sku_id=[$2])
>           JdbcTableScan(table=[[db1, sku_sales]])
>     LogicalAggregate(group=[{0}], pv=[COUNT($1)])
>       LogicalFilter(condition=[=($0, 1)])
>         LogicalProject(sku_id=[$2], pv=[$3])
>           JdbcTableScan(table=[[db2, sku_flow]]) {code}
> optimize:
> When registering the rule, the rule for the DB2 convention will not be 
> registered because the jdbcRule has the same description as the rule for the 
> DB1 convention, causing the addRule method to return false. 
> org.apache.calcite.adapter.jdbc.JdbcConvention#register
> {code:java}
> @Override public void register(RelOptPlanner planner) {
>   for (RelOptRule rule : JdbcRules.rules(this)) {
>     planner.addRule(rule);
>   }
>   planner.addRule(FilterSetOpTransposeRule.INSTANCE);
>   planner.addRule(ProjectRemoveRule.INSTANCE);
> } {code}
> after optimize:
> {code:java}
> EnumerableCalc(expr#0..3=[{inputs}], expr#4=[/($t3, $t1)], EXPR$0=[$t4], 
> pv=[$t3], sales=[$t1])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[full])
>     JdbcToEnumerableConverter
>       JdbcAggregate(group=[{2}], sales=[COUNT()])
>         JdbcFilter(condition=[=($2, 1)])
>           JdbcTableScan(table=[[db1, sku_sales]])
>     EnumerableAggregate(group=[{2}], pv=[COUNT($3)])
>       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[1], expr#6=[=($t2, $t5)], 
> proj#0..4=[{exprs}], $condition=[$t6])
>         JdbcToEnumerableConverter
>           JdbcTableScan(table=[[db2, sku_flow]])
>  {code}
> As a result, we can see that the push down for DB2 fails.
>    



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-6216) The JDBC adapter is unable to push down queries when there is a join involved in the query.

Reply via email to