hi Huang
1. Reduce the interleaved project-filter to one single
project-filter or filter-project,i always use `Calc Operator` to optimize
`project-filter` or `filter-project`, Using the `Calc` operator,
relational algebra can be simplified,This relational expression combines the
functionality of `Project` and `Filter`.In Calcite, you can use
`FilterToCalcRule` and `ProjectToCalcRule`, `CalcMergeRule` may also be
used. These rules can be used by `HepPlanner`??
2. I'm not sure if you want to remove `CAST` from
RexNode,Maybe you can refer
`org.apache.calcite.rex.RexUtil#removeCast`. In the RelNode, we can
implement a `RelOptRule` to match the operator, and then remove the cast
operator.you can refer `RelOptRules` in
`org.apache.calcite.rel.rules.CoreRules`.In addition, you can use `RelShuttle`
to rewrite RelNode, both of them can remove the `cast operator`.
I hope it can help you.
Best
Xzh
------------------ Original ------------------
From:
"dev"
<[email protected]>;
Date: Mon, Sep 27, 2021 10:38 AM
To: "[email protected]"<[email protected]>;
Subject: Need help: Optimization rule to reduce
project-filter-project-filter on one table
Hi Calcite developers,
I just started learning Calcite, and tried Calcite optimization on a fake SQL
as shown in below table.
I see that generated Logical plan has interleaved Project-Filter-Project-Filter
chain on one table.
I need some help on 2 questions:
1. Does Calcite have any rule to reduce the interleaved
project-filter to one single project-filter or filter-project?
1. How can I remove the CAST operation in the SQL statement
generated from RelNode?
Thanks,
Justin
# Input fake SQL statement
select
*
from
schema_csv.table_csv
left join
(
select
BugId,
ARB,
(
BugId + BugId
)
as BugId2
from
(
select
*
from
schema_bug.table_bug
where
BugId = 100
and ARB = \'abc\'
)
where
BugId >
200741801
AND BugId <
(200751927 - 2)
)
as MyBug
on schema_csv.table_csv.BugId = MyBug.BugId
# Generated Logical plan
LogicalProject(name=[$0], age=[$1], position=[$2], BugId=[$3], BugId0=[$4],
ARB=[$5], BugId2=[$6])
LogicalJoin(condition=[=($3, $4)], joinType=[left])
LogicalTableScan(table=[[schema_csv, table_csv]])
LogicalProject(BugId=[$0],
ARB=[CAST('abc':VARCHAR):VARCHAR], BugId2=[+($0, $0)])
LogicalFilter(condition=[AND(>($0,
200741801), <($0, -(200751927, 2)))])
LogicalProject(BugId=[CAST(100:BIGINT):BIGINT],
ARB=[CAST('abc':VARCHAR):VARCHAR])
LogicalFilter(condition=[AND(=($0, 100), =($1, 'abc'))])
LogicalTableScan(table=[[schema_bug, table_bug]])
# Generated sql (HiveSqlDialect) from the LogicalProject input of LogicalJoin
SELECT BugId, CAST('abc' AS VARCHAR) ARB, BugId + BugId BugId2 FROM (SELECT
CAST(100 AS BIGINT) BugId, CAST('abc' AS VARCHAR) ARB FROM table_bug WHERE
BugId = 100 AND ARB = 'abc') t0 WHERE BugId > 200741801 AND BugId <
200751927 - 2