[
https://issues.apache.org/jira/browse/CALCITE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Leonard Xu updated CALCITE-3368:
--------------------------------
Description:
'is null' expression in SQL may be optimized incorrectly in the underlying
implementation.
When I write a Fink SQL to test overflow just like
{code:java}
select
case when (f0 + f1) is null then 'null' else 'not null' end
from testTable
{code}
, I found expression '(f0 + f1) is null ' has been optimized by Calcite, and
the optimization may be incorrect.
The underlying implementation is that Calcite's simplification logic of isNull
expression in SQL will convert from
*"f(operand0, operand1) IS NULL"* to
*"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind
is ANY。
This simplification leads to the expression will not calculate the real
value of *f(operand0, operand1)* (eg.. '(f0 + f1)' in my case ),but '(f0 +
f1)' maybe overflows after operation.
{code:java}
//org.apache.calcite.rex.RexSimplify.java
private RexNode simplifyIsNull(RexNode a) {
// Simplify the argument first,
// call ourselves recursively to see whether we can make more progress.
// For example, given
// "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the
// argument to "2", and only then we can simplify "2 IS NULL" to "FALSE".
a = simplify(a, UNKNOWN);
if (!a.getType().isNullable() && isSafeExpression(a)) {
return rexBuilder.makeLiteral(false);
}
if (RexUtil.isNull(a)) {
return rexBuilder.makeLiteral(true);
}
if (a.getKind() == SqlKind.CAST) {
return null;
}
switch (Strong.policy(a.getKind())) {
case NOT_NULL:
return rexBuilder.makeLiteral(false);
case ANY:
// "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies
// to "operand0 IS NULL OR operand1 IS NULL"
final List<RexNode> operands = new ArrayList<>();
for (RexNode operand : ((RexCall) a).getOperands()) {
final RexNode simplified = simplifyIsNull(operand);
if (simplified == null) {
operands.add(
rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand));
} else {
operands.add(simplified);
}
}
return RexUtil.composeDisjunction(rexBuilder, operands, false);
case AS_IS:
default:
return null;
}
}{code}
And most of calculating SqlKinds are assigned *Policy.ANY* at present.
{code:java}
//org.apache.calcite.plan.Strong.java
public static Policy policy(SqlKind kind) {
return MAP.getOrDefault(kind, Policy.AS_IS);
}
....
map.put(SqlKind.PLUS, Policy.ANY);
map.put(SqlKind.PLUS_PREFIX, Policy.ANY);
map.put(SqlKind.MINUS, Policy.ANY);
map.put(SqlKind.MINUS_PREFIX, Policy.ANY);
map.put(SqlKind.TIMES, Policy.ANY);
map.put(SqlKind.DIVIDE, Policy.ANY);
* that operator evaluates to null. */
public enum Policy {
/** This kind of expression is never null. No need to look at its arguments,
* if it has any. */
NOT_NULL,
/** This kind of expression has its own particular rules about whether it
* is null. */
CUSTOM,
/** This kind of expression is null if and only if at least one of its
* arguments is null. */
ANY,
/** This kind of expression may be null. There is no way to rewrite. */
AS_IS,
}{code}
It may be an obvious nonequivalent simplification in SQL. And this issue come
from Flink (FLINK-14030).
[~danny0405], Could you have a look at this?
was:
'is null' expression in SQL may be optimized incorrectly in the underlying
implementation.
When I test a Fink SQL to check overflow situation like
{code:java}
select
case when (f0 + f1) is null then 'null' else 'not null' end
from testTable
{code}
, I found expression '(f0 + f1) is null ' has been optimized by Calcite, and
the optimization may be incorrect.
The und
The Calcite's simplification logic of isNull expression in SQL will convert
from
*"f(operand0, operand1) IS NULL"* to
*"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind
is ANY。
This simplification leads to the expression will not calculate the real
value of *f(operand0, operand1)* (eg..'f0 + 'f1 )which maybe overflow.
{code:java}
//org.apache.calcite.rex.RexSimplify.java
private RexNode simplifyIsNull(RexNode a) {
// Simplify the argument first,
// call ourselves recursively to see whether we can make more progress.
// For example, given
// "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the
// argument to "2", and only then we can simplify "2 IS NULL" to "FALSE".
a = simplify(a, UNKNOWN);
if (!a.getType().isNullable() && isSafeExpression(a)) {
return rexBuilder.makeLiteral(false);
}
if (RexUtil.isNull(a)) {
return rexBuilder.makeLiteral(true);
}
if (a.getKind() == SqlKind.CAST) {
return null;
}
switch (Strong.policy(a.getKind())) {
case NOT_NULL:
return rexBuilder.makeLiteral(false);
case ANY:
// "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies
// to "operand0 IS NULL OR operand1 IS NULL"
final List<RexNode> operands = new ArrayList<>();
for (RexNode operand : ((RexCall) a).getOperands()) {
final RexNode simplified = simplifyIsNull(operand);
if (simplified == null) {
operands.add(
rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand));
} else {
operands.add(simplified);
}
}
return RexUtil.composeDisjunction(rexBuilder, operands, false);
case AS_IS:
default:
return null;
}
}{code}
And most of calculating SqlKinds are assigned *Policy.ANY* at present.
{code:java}
//org.apache.calcite.plan.Strong.java
public static Policy policy(SqlKind kind) {
return MAP.getOrDefault(kind, Policy.AS_IS);
}
....
map.put(SqlKind.PLUS, Policy.ANY);
map.put(SqlKind.PLUS_PREFIX, Policy.ANY);
map.put(SqlKind.MINUS, Policy.ANY);
map.put(SqlKind.MINUS_PREFIX, Policy.ANY);
map.put(SqlKind.TIMES, Policy.ANY);
map.put(SqlKind.DIVIDE, Policy.ANY);
* that operator evaluates to null. */
public enum Policy {
/** This kind of expression is never null. No need to look at its arguments,
* if it has any. */
NOT_NULL,
/** This kind of expression has its own particular rules about whether it
* is null. */
CUSTOM,
/** This kind of expression is null if and only if at least one of its
* arguments is null. */
ANY,
/** This kind of expression may be null. There is no way to rewrite. */
AS_IS,
}{code}
It may be an obvious nonequivalent simplification in SQL.
And this issue come from Flink (FLINK-14030).
[~danny0405], Could you have a look at this?
> 'is null' expression in SQL may be optimized incorrectly in the underlying
> implementation
> -----------------------------------------------------------------------------------------
>
> Key: CALCITE-3368
> URL: https://issues.apache.org/jira/browse/CALCITE-3368
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.21.0
> Reporter: Leonard Xu
> Assignee: Danny Chan
> Priority: Major
>
> 'is null' expression in SQL may be optimized incorrectly in the underlying
> implementation.
>
> When I write a Fink SQL to test overflow just like
> {code:java}
> select
> case when (f0 + f1) is null then 'null' else 'not null' end
> from testTable
> {code}
> , I found expression '(f0 + f1) is null ' has been optimized by Calcite, and
> the optimization may be incorrect.
>
> The underlying implementation is that Calcite's simplification logic of
> isNull expression in SQL will convert from
> *"f(operand0, operand1) IS NULL"* to
> *"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind
> is ANY。
> This simplification leads to the expression will not calculate the real
> value of *f(operand0, operand1)* (eg.. '(f0 + f1)' in my case ),but '(f0 +
> f1)' maybe overflows after operation.
> {code:java}
> //org.apache.calcite.rex.RexSimplify.java
> private RexNode simplifyIsNull(RexNode a) {
> // Simplify the argument first,
> // call ourselves recursively to see whether we can make more progress.
> // For example, given
> // "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the
> // argument to "2", and only then we can simplify "2 IS NULL" to "FALSE".
> a = simplify(a, UNKNOWN);
> if (!a.getType().isNullable() && isSafeExpression(a)) {
> return rexBuilder.makeLiteral(false);
> }
> if (RexUtil.isNull(a)) {
> return rexBuilder.makeLiteral(true);
> }
> if (a.getKind() == SqlKind.CAST) {
> return null;
> }
> switch (Strong.policy(a.getKind())) {
> case NOT_NULL:
> return rexBuilder.makeLiteral(false);
> case ANY:
> // "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies
> // to "operand0 IS NULL OR operand1 IS NULL"
> final List<RexNode> operands = new ArrayList<>();
> for (RexNode operand : ((RexCall) a).getOperands()) {
> final RexNode simplified = simplifyIsNull(operand);
> if (simplified == null) {
> operands.add(
> rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand));
> } else {
> operands.add(simplified);
> }
> }
> return RexUtil.composeDisjunction(rexBuilder, operands, false);
> case AS_IS:
> default:
> return null;
> }
> }{code}
> And most of calculating SqlKinds are assigned *Policy.ANY* at present.
> {code:java}
> //org.apache.calcite.plan.Strong.java
> public static Policy policy(SqlKind kind) {
> return MAP.getOrDefault(kind, Policy.AS_IS);
> }
> ....
> map.put(SqlKind.PLUS, Policy.ANY);
> map.put(SqlKind.PLUS_PREFIX, Policy.ANY);
> map.put(SqlKind.MINUS, Policy.ANY);
> map.put(SqlKind.MINUS_PREFIX, Policy.ANY);
> map.put(SqlKind.TIMES, Policy.ANY);
> map.put(SqlKind.DIVIDE, Policy.ANY);
> * that operator evaluates to null. */
> public enum Policy {
> /** This kind of expression is never null. No need to look at its arguments,
> * if it has any. */
> NOT_NULL,
> /** This kind of expression has its own particular rules about whether it
> * is null. */
> CUSTOM,
> /** This kind of expression is null if and only if at least one of its
> * arguments is null. */
> ANY,
> /** This kind of expression may be null. There is no way to rewrite. */
> AS_IS,
> }{code}
>
> It may be an obvious nonequivalent simplification in SQL. And this issue come
> from Flink (FLINK-14030).
> [~danny0405], Could you have a look at this?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)