Thank you all for the feedback. I will do a small research about the proper term to use for functions that might throw / never throw and log a JIRA to move the discussion further.
@Viliam : Note that the problem you mention, although similar, is not exactly the same as the one I brought up now. The reordering of predicates in the WHERE clause has been brought up quite a few times in the dev list and our stance [1] is that since the standard leaves this decision to the implementor people should not rely on this (or put the appropriate guards). The case here is a bit different, at least the way I read the standard, cause it defines the following: "If all optional clauses are omitted, then the result of the <table expression> is the same as the result of the <from clause>. Otherwise, each specified clause is applied to the result of the previously specified clause and the result of the <table expression> is the result of the application of the last specified clause." and one of the optional clauses mentioned in the previous paragraph is the <where clause>. There seems to be a clearly defined order between the <from clause>, which includes inner joins, and the <where clause>. Best, Stamatis [1] https://lists.apache.org/thread/mq44cnrohz19hh10btms126vbcoxl50w On Fri, Feb 18, 2022 at 9:58 AM Viliam Durina <vil...@hazelcast.com> wrote: > I have observed this issue years ago in well-known databases. My case was > much simpler: > > data > recordType:int value:text > ----- ------ > 0 1 > 1 a > > SELECT * > FROM data > WHERE recordType='1' AND CAST(value AS INT)<10 > > > SQL is declarative, and unlike procedural languages, it doesn't prescribe > short-circuit evaluation of the WHERE clause, or any specific evaluation > order. If it was prescribed, the query would be perfectly safe. But > prescribing the evaluation order would rule out many optimizations, or make > them much harder, such as this half-null-half-error value. > > For example, reordering additions might or might not lead to overflow: > TINYINTs `100 + 100 - 90`, evaluated in this order, overflow, but `100 - 90 > + 100` don't - imagine each value comes from a different table and we > reorder the joins. Perhaps result of TINYINT addition can be SMALLINT, but > what if they are BIGINTs? > > My understanding was that any expression must not fail in any plausible > execution plan. Therefore if I parse the column `value`, it must succeed > for every possible scanned row. In my specific case I ended up implementing > a custom TO_NUMBER function that returns null on parsing error, and that > null never appeared in the result because of the other condition. > > Viliam > > On Thu, 17 Feb 2022 at 20:08, Julian Hyde <jhyde.apa...@gmail.com> wrote: > > > You should definitely log a bug, even if there are no plans to fix it. It > > is a violation of the standard, and therefore it is a bug, and therefore > we > > should document that it exists. > > > > Can you do some research on the right terminology. You use the term > > ‘unsafe’. I think the PL community uses the term ’strict’ for expressions > > that must be evaluated (and therefore if they throw, the query is sure to > > abort). We have other related concepts floating around in Calcite - > dynamic > > functions, deterministic operators - and standardizing terminology is key > > if we are to keep the menagerie in order. > > > > There might be a way to have our cake and eat it too. We could push down > > the division, and if there is a division-by-zero we return a special > value > > such as NULL. If the special value is not eliminated by the join then it > is > > promoted to a throw. The “cake” here is the performance benefit of > pushing > > down a filter (that may never or rarely throw); the “eating it” is safety > > on the occasion that the filter does throw. > > > > Even if that strategy doesn’t work, maybe we could have a flag that says > > whether to push down conditions that might throw. > > > > Julian > > > > > > > On Feb 17, 2022, at 8:07 AM, Scott Reynolds <sdrreyno...@gmail.com> > > wrote: > > > > > > Is it feasible to prevent the filter push down in cases like this ( > > detect > > > potential division by zero) ? What other exceptions that could happen? > > > > > > Should it only push down some filters and preserve the complex filter > > above > > > the join? > > > > > > Regarding a Jira ticket, generating an exception when it should produce > > > valid results is gotta be a Jira case IMHO. > > > > > > > > > > > > > > > On Thu, Feb 17, 2022, 7:17 AM Stamatis Zampetakis <zabe...@gmail.com> > > wrote: > > > > > >> Hi all, > > >> > > >> Till today, I believed that pushing a filter below an inner join is > not > > >> strictly related to the operators (equals, plus, minus, cast, > division) > > >> present in the filter. > > >> > > >> However, the query below will return some results if the filter is > > executed > > >> after the join or it will fail with an exception if the filter is > pushed > > >> below the join. > > >> > > >> EMP [EMPNO, NAME, DEPTNO] > > >> 0 | Alex | 0 > > >> 10 | Bob | 1 > > >> > > >> DEP [DEPTNO] > > >> 1 > > >> > > >> SELECT e.name > > >> FROM emp e > > >> INNER JOIN dep d ON e.deptno = d.deptno > > >> WHERE (10 / e.empno) = 1 > > >> > > >> It seems that SQL standard (Section 7.4 general rule 1) mandates that > > WHERE > > >> should be applied to the result of FROM so in the case above pushing > > >> filters below the join seems to violate the standard. > > >> > > >> Currently, in Calcite we are going to push the division, cast, and > > >> basically any kind of operator below the join. Many well-known > databases > > >> e.g., Postgres, do that as well. > > >> > > >> Should we log it as a bug and try to do something about it or let it > be > > as > > >> it is? > > >> > > >> Best, > > >> Stamatis > > >> > > > > > > -- > This message contains confidential information and is intended only for > the > individuals named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and > delete this e-mail from your system. E-mail transmission cannot be > guaranteed to be secure or error-free as information could be intercepted, > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. > The sender therefore does not accept liability for any errors or omissions > in the contents of this message, which arise as a result of e-mail > transmission. If verification is required, please request a hard-copy > version. -Hazelcast >