Re: Search/Sarg: untested feature merged to the default branch

Julian Hyde Mon, 16 Nov 2020 14:42:52 -0800

Pushing release 1.27 back to the New Year makes sense.

We need to do some work on Sargs. As to what that work should be,
things are going to get controversial. (I would appreciate help from
impartial elders like Stamatis, Francis, Haisheng to guide this
discussion to where we can agree.)

I agree with Vladimir that Search does not handle unknown (i.e.
boolean null) values properly. He proposes that Search should always
return true or false, and that is probably right. (That would imply
that we would have to add logic to check for null values before
calling Search.)

But then we run into differences of approach. Vladimir sees this
through the lens of RexFuzzyTest, and wants to get this on a solid
formal basis. (If you look at what Vladimir had to do in
https://github.com/apache/calcite/pull/2250 you will see that in
addition to tech debt in RexFuzzyTest, RexSimplify and elsewhere; bugs
https://issues.apache.org/jira/browse/CALCITE-4399,
https://issues.apache.org/jira/browse/CALCITE-4398,
https://issues.apache.org/jira/browse/CALCITE-4397,
https://issues.apache.org/jira/browse/CALCITE-4388; also, Danny's
https://issues.apache.org/jira/browse/CALCITE-4377 exists to fix tech
debt introduced in
https://issues.apache.org/jira/browse/CALCITE-3457.)

Meanwhile, I see the world through the lens of SQL statements giving
wrong results. A few issues in this area have been logged, and Danny
and I have fixed them (see
https://issues.apache.org/jira/browse/CALCITE-4325,
https://issues.apache.org/jira/browse/CALCITE-4352,
https://issues.apache.org/jira/browse/CALCITE-4364).

I have tried to find a case where SEARCH gives wrong results in the
presence of NULLs, and I have failed. For example, I crafted the
following test case in operator.iq:

diff --git a/core/src/test/resources/sql/operator.iq
b/core/src/test/resources/sql/operator.iq
index 904f1008e..6b71ca553 100644
--- a/core/src/test/resources/sql/operator.iq
+++ b/core/src/test/resources/sql/operator.iq
@@ -66,6 +66,35 @@ select * from "scott".emp where not sal > 1300 and
not sal < 1200;

 !ok

+# Sargs and three-valued boolean logic
+select empno, ename, job, comm,
+  comm between 1000 and 2000 as b,
+  case
+  when (comm between 1000 and 2000) is unknown then 'u'
+  when (comm between 1000 and 2000) is true then 't'
+  when (comm between 1000 and 2000) is false then 'f'
+  else '?'
+  end as truth,
+  comm in (0, 100, 200, 300) as b2,
+  comm in (0, 100, 200, 300, null) as b3
+from "scott".emp
+where deptno = 30
+order by empno;
++-------+--------+----------+---------+-------+-------+-------+------+
+| EMPNO | ENAME  | JOB      | COMM    | B     | TRUTH | B2    | B3   |
++-------+--------+----------+---------+-------+-------+-------+------+
+|  7499 | ALLEN  | SALESMAN |  300.00 | false | f     | true  | true |
+|  7521 | WARD   | SALESMAN |  500.00 | false | f     | false |      |
+|  7654 | MARTIN | SALESMAN | 1400.00 | true  | t     | false |      |
+|  7698 | BLAKE  | MANAGER  |         |       | u     |       |      |
+|  7844 | TURNER | SALESMAN |    0.00 | false | f     | true  | true |
+|  7900 | JAMES  | CLERK    |         |       | u     |       |      |
++-------+--------+----------+---------+-------+-------+-------+------+
+(6 rows)
+
+!ok
+
+
 # MULTISET EXCEPT
 values multiset ['a', 'c', 'a'] multiset except multiset ['a'];
 +--------+

By putting the condition in the SELECT clause, we see TRUE, FALSE and
UNKNOWN as distinct values; conditions are usually used in WHERE,
HAVING or ON, where UNKNOWN and FALSE are both treated as FALSE. Note
that the 'b' column has a mix of true, false and blank (unknown)
values.

Knowing that Search doesn't handle null values, I expected the query
to fail, but Calcite gives the correct results (I checked them on
hsqldb). I surmise that when Search is converted back to operators
during code generation, we also ignore 3-valued logic, and so the two
bugs cancel out.

This is frustrating for me trying to reproduce a bug before I fix it.
But it is good news for people using Calcite 1.26 - it is very
unlikely that they are being affected by the bug.

Back to our priorities before 1.27. I think we should continue to
identify and burn down tech debt in RexSimplify, RexProgramFuzzyTest.
That includes revisiting issues raised by Vladimir, above. (I am
skeptical about a few of these, such as that RexSimplify should not
change the nullability of an expression.)

To summarize. We need a measured discussion about what is wrong (level
of impact, level of effort to fix, and degree of disruption if we fix
it, or revert it) and I would appreciate the help of the elders to
moderate that discussion.

And I would love a SQL test case that shows that SEARCH is broken. :)

Julian

On Mon, Nov 16, 2020 at 1:56 PM Francis Chuang <[email protected]> wrote:
>
> We currently maintain a release cadence of a new release every two
> months. This puts the next release pretty close to Christmas next month.
>
> Would it make sense if we push 1.27.0 back a few weeks until maybe after
> new year? In the mean time, perhaps we can focus on getting these issues
> fixed and improve the reliability of search/sarge? Perhaps even to the
> extent where we divert some resources from other features and put more
> focus into search/sarge.
>
> Francis
>
> On 17/11/2020 5:38 am, Julian Hyde wrote:
> > And yet more passive-aggressive bulls**t from Vladimir:
> >
> > https://github.com/apache/calcite/commit/c6b3fb7ddbf8aed981ff3fb0632c77216dbe68d6
> >
> > I've force-pushed to remove it.
> >
> > I plan to respond to Stamatis' thoughtful email later today. Also, I
> > think I may have a test case that demonstrates the issues that
> > Vladimir is talking about; if so, I'll log a JIRA case. But for now, I
> > need to get on with my work week.
> >
> > Julian
> >
> >
> >
> >
> > On Mon, Nov 16, 2020 at 3:28 AM Vladimir Sitnikov
> > <[email protected]> wrote:
> >>
> >> Stamatis>@Vladimir: Can you specify which JIRA issue(s) you would like to
> >> see solved
> >> Stamatis>for the next release?
> >>
> >> RexProgramFuzzyTest should be enabled, and Sarg-related fuzzing should be
> >> implemented as well.
> >> RexFuzzer should generate Sarg expressions.
> >> RexInterpreter should be able to evaluate search/sarg.
> >> RexProgramFuzzyTest should pass both with and without expandSearch.
> >>
> >> I'm afraid I have no time on executing RexProgramFuzzyTest and logging
> >> failures as JIRA cases.
> >> I assume committers should ensure the code they commit passes existing
> >> tests, and they should ensure new code is accompanied by tests as well.
> >>
> >> It looks like I tried everything:
> >> a) I kindly asked Julian to refrain from committing untested code (both in
> >> PR and in the very first message of the thread)
> >> b) I highlighted major design issues (see unknownAs in [1])
> >> c) I provided sample failures (see "sample input" in [2])
> >> d) I even suggested the plan to resolve the issues: "revert search/sarg",
> >> "re-enable fuzzer", "fix issues discovered by the fuzzer", "re-add
> >> search/sarg".
> >> Not only I suggested the approach, I implemented a noticeable part of it
> >> (e.g.I re-enabled the fuzzer and added relevant code fixes to make the test
> >> pass, see CALCITE-4398, CALCITE-4397, CALCITE-4388), however, Julian
> >> discarded all of it with "A commit war is no way to proceed" comment.
> >>
> >>
> >> Now I stop doing everything related to search/sarg.
> >> Of course, if there's a release, and the fuzzer is still disabled, I would
> >> put my -1 vote.
> >>
> >> [1]
> >> https://lists.apache.org/thread.html/r602b7e17cf076f00dfbd946e814cea098a7b81e0b252bfa6e2fd5e9b%40%3Cdev.calcite.apache.org%3E
> >> [2]
> >> https://lists.apache.org/thread.html/r6d509f7202195362b33f93c076b5010bd1a9a5ac1e4fa6814819077c%40%3Cdev.calcite.apache.org%3E
> >>
> >> Vladimir

Re: Search/Sarg: untested feature merged to the default branch

Reply via email to