Ruben Q L created CALCITE-7635:
----------------------------------

             Summary: Simplification result of conjunction of comparisons 
depends on terms order
                 Key: CALCITE-7635
                 URL: https://issues.apache.org/jira/browse/CALCITE-7635
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.42.0
            Reporter: Ruben Q L


Calling RexSimplify with unknown=FALSE on this expression:
{noformat}
x < 10 AND x > 0 AND x < 20
i.e.
AND( <(x, 10), >(x, 0), <(x, 20) )
{noformat}
Should return {{x < 10 AND x > 0}}
And this happens if we run the following test inside RexProgramTest 
(simplification result is returned on SEARCH format):
{code:java}
  @Test void testSimplifyAndComparison() {
    checkSimplifyFilter(
        and(
            gt(vInt(), literal(0)),
            lt(vInt(), literal(10)),
            lt(vInt(), literal(20))),
        "SEARCH(?0.int0, Sarg[(0..10)])"); // runs OK
  }
{code}
However, if we alter the order of the terms, and run instead:
{code:java}
  @Test void testSimplifyAndComparison() {
    checkSimplifyFilter(
        and(
            lt(vInt(), literal(10)),
            gt(vInt(), literal(0)),
            lt(vInt(), literal(20))),
        "SEARCH(?0.int0, Sarg[(0..10)])"); // fails
  }
{code}
If fails with:
{noformat}
Expected: with toString() is "SEARCH(?0.int0, Sarg[(0..10)])"
     but: toString() was "SEARCH(?0.int0, Sarg[(0..10); NULL AS FALSE])"
{noformat}

The problem is that, in the second case, the simplification result (before 
being converted into a SEARCH) is not {{x < 10 AND x > 0}} , but {{x < 10 AND x 
> 0 AND X IS NOT NULL}}; this extra IS NOT NULL turns the final SEARCH 
expression into "NULL AS FALSE" (instead of the default "NULL AS UNKNOWN" that 
we got on the first case). This "NULL AS FALSE" will make that, if this SEARCH 
expression gets later expanded, we'll get again {{x < 10 AND x > 0 AND X IS NOT 
NULL}}; instead of the expected {{x < 10 AND x > 0}};

The problem originates on {{RexSimplify#simplifyAnd}}, with unknownAs == FALSE 
and predicateElimination (be default true), we get into {{simplifyAndTerms}}:
{code:java}}
RexNode simplifyAnd(RexCall e, RexUnknownAs unknownAs) {
    List<RexNode> operands = RelOptUtil.conjunctions(e);
    if (unknownAs == FALSE && predicateElimination) {
      simplifyAndTerms(operands, FALSE);
    }
    ...
{code}

Inside this method, each term is simplified using the previous terms as 
"predicates" into the RexSimplify:
{code:java}
private void simplifyAndTerms(List<RexNode> terms, RexUnknownAs unknownAs) {
    RexSimplify simplify = this;
    for (int i = 0; i < terms.size(); i++) {
      RexNode t = terms.get(i);
      if (Predicate.of(t) == null) {
        continue;
      }
      terms.set(i, simplify.simplify(t, unknownAs)); // simplifies using 
previous terms as predicates
      RelOptPredicateList newPredicates =
          simplify.predicates.union(rexBuilder,
              RelOptPredicateList.of(rexBuilder, terms.subList(i, i + 1)));
      simplify = simplify.withPredicates(newPredicates);
    }
    ...
{code}

When simplifying each term (using the previous ones as predicates), since they 
are comparisons, we'll get inside RexSimplify#simplifyComparison, which calls 
at the end {{simplifyUsingPredicates}}:
{code:java}
private <C extends Comparable<C>> RexNode simplifyComparison(RexCall e,
      RexUnknownAs unknownAs, Class<C> clazz) {
    ...
    return simplifyUsingPredicates(e2, clazz);
  }
{code}

And {{simplifyUsingPredicates}} calls {{residue}} method, which can return 
{{RangeSets.rangeSetAll()}} (which results in returning IS NOT NULL, because 
"nullability might be problematic"), but only depending on the order in which 
the terms have been processed:
{code:java}
  private <C extends Comparable<C>> RexNode simplifyUsingPredicates(RexNode e,
      Class<C> clazz) {
    ...
    final C v0 = comparison.literal.getValueAs(clazz);
    ...
    final RangeSet<C> rangeSet = rangeSet(comparison.kind, v0);
    final RangeSet<C> rangeSet2 =
        residue(comparison.ref, rangeSet, predicates.pulledUpPredicates,
            clazz);
    if (rangeSet2.isEmpty()) {
      // Term is impossible to satisfy given these predicates
      return rexBuilder.makeLiteral(false);
    } else if (rangeSet2.equals(rangeSet)) {
      // no change
      return e;
    } else if (rangeSet2.equals(RangeSets.rangeSetAll())) {
      // Range is always satisfied given these predicates; but nullability might
      // be problematic
      return simplify(
          rexBuilder.makeCall(RexUtil.getPos(e), 
SqlStdOperatorTable.IS_NOT_NULL, comparison.ref),
          RexUnknownAs.UNKNOWN); // [1]
    ...
    } else {
      // range has been reduced but it's not worth simplifying
      return e; // [2]
    }
{code}

For example: 
- Processing x<10, x>0, x<20; will return in X is NOT NULL (via [1]) when x<20 
is processed here
- Processing x<10, x<20, x>0; will return in X is NOT NULL (via [1]) when x<20 
is processed here
- Processing x>0, x<10, x<20; will return x<20 (via [2]) when x<20 is processed 
here 
- Processing x>0, x<20, x<10; will return x<20 (via [2]) when x<20 is processed 
here (and x<10 when x<10 is processed) 
- Processing x<20, x>0, x<10; will return x<20 (via [2]) when x<20 is processed 
here (and x<10 when x<10 is processed)

Basically, the problem is: when the expression x<20 (i.e. range (-INF, 20)) is 
processed and the list of previous predicates contains as first item a 
predicate enclosed by it [3], in this case x<10 (i.e. range(-INF, 10)), 
{{residue}} will consider intermediate result as {{RangeSets.rangeSetAll()}} 
(i.e. (-INF, +INF), which will enclose any other predicate on subsequent 
iterations, so this will be the result to be returned:
{code:java}
private static <C extends Comparable<C>> RangeSet<C> residue(RexNode ref,
      RangeSet<C> r0, List<RexNode> predicates, Class<C> clazz) {
    RangeSet<C> result = r0;
    for (RexNode predicate : predicates) {
      ...
        final RexCall call = (RexCall) predicate;
        final Comparison comparison = Comparison.of(call);
        if (comparison != null && comparison.ref.equals(ref)) {
          final C c1 = comparison.literal.getValueAs(clazz);
          ...
            final Range<C> r1 = range(comparison.kind, c1);
            if (result.encloses(r1)) {
              // Given these predicates, term is always satisfied.
              // e.g. r0 is "$0 < 10", r1 is "$0 < 5"
              result = RangeSets.rangeSetAll(); [3]
              continue;
            }
            result = result.subRangeSet(r1); [4]
      ...
    }
    return result;
  }
{code}

Notice that, in case we process x<20 (i.e. range (-INF, 20)), but the list of 
predicates is not [(-INF, 10), (0, INF)] , but instead the other way around 
[(0, INF), (-INF, 10)]; then result will never be (-INF, +INF), but (0,10), 
since it will be accumulated as (0, 20) via [2] on the first iteration (when 
processing predicate (-INF, 10)); and then (0, 10) on the second iteration, 
again via [2] when processing this intermediate result with the predicate 
(-INF, 10).

To sum up, the fact that RexSimplify#residue result can differ depending on the 
order of its list of predicates parameter, results in a total different result 
on the initial RexSimplify#simplify call.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to