Re: Pushing filters with unsafe operations (CAST/DIVISION) below inner joins

Stamatis Zampetakis Fri, 18 Feb 2022 13:57:17 -0800

Thank you all for the feedback.

I will do a small research about the proper term to use for functions that
might throw / never throw and log a JIRA to move the discussion further.


@Viliam : Note that the problem you mention, although similar, is not
exactly the same as the one I brought up now.

The reordering of predicates in the WHERE clause has been brought up quite
a few times in the dev list and our stance [1] is that since the standard
leaves this decision to the implementor people should not rely on this (or
put the appropriate guards).

The case here is a bit different, at least the way I read the standard,
cause it defines the following:

"If all optional clauses are omitted, then the result of the <table
expression> is the same as the result of the
<from clause>. Otherwise, each specified clause is applied to the result of
the previously specified clause
and the result of the <table expression> is the result of the application
of the last specified clause."

and one of the optional clauses mentioned in the previous paragraph is the
<where clause>. There seems to be a clearly defined order between the <from
clause>, which includes inner joins, and the <where clause>.

Best,
Stamatis

[1] https://lists.apache.org/thread/mq44cnrohz19hh10btms126vbcoxl50w

On Fri, Feb 18, 2022 at 9:58 AM Viliam Durina <vil...@hazelcast.com> wrote:

> I have observed this issue years ago in well-known databases. My case was
> much simpler:
>
> data
> recordType:int  value:text
> -----           ------
> 0               1
> 1               a
>
> SELECT *
> FROM data
> WHERE recordType='1' AND CAST(value AS INT)<10
>
>
> SQL is declarative, and unlike procedural languages, it doesn't prescribe
> short-circuit evaluation of the WHERE clause, or any specific evaluation
> order. If it was prescribed, the query would be perfectly safe. But
> prescribing the evaluation order would rule out many optimizations, or make
> them much harder, such as this half-null-half-error value.
>
> For example, reordering additions might or might not lead to overflow:
> TINYINTs `100 + 100 - 90`, evaluated in this order, overflow, but `100 - 90
> + 100` don't - imagine each value comes from a different table and we
> reorder the joins. Perhaps result of TINYINT addition can be SMALLINT, but
> what if they are BIGINTs?
>
> My understanding was that any expression must not fail in any plausible
> execution plan. Therefore if I parse the column `value`, it must succeed
> for every possible scanned row. In my specific case I ended up implementing
> a custom TO_NUMBER function that returns null on parsing error, and that
> null never appeared in the result because of the other condition.
>
> Viliam
>
> On Thu, 17 Feb 2022 at 20:08, Julian Hyde <jhyde.apa...@gmail.com> wrote:
>
> > You should definitely log a bug, even if there are no plans to fix it. It
> > is a violation of the standard, and therefore it is a bug, and therefore
> we
> > should document that it exists.
> >
> > Can you do some research on the right terminology. You use the term
> > ‘unsafe’. I think the PL community uses the term ’strict’ for expressions
> > that must be evaluated (and therefore if they throw, the query is sure to
> > abort). We have other related concepts floating around in Calcite -
> dynamic
> > functions, deterministic operators - and standardizing terminology is key
> > if we are to keep the menagerie in order.
> >
> > There might be a way to have our cake and eat it too. We could push down
> > the division, and if there is a division-by-zero we return a special
> value
> > such as NULL. If the special value is not eliminated by the join then it
> is
> > promoted to a throw. The “cake” here is the performance benefit of
> pushing
> > down a filter (that may never or rarely throw); the “eating it” is safety
> > on the occasion that the filter does throw.
> >
> > Even if that strategy doesn’t work, maybe we could have a flag that says
> > whether to push down conditions that might throw.
> >
> > Julian
> >
> >
> > > On Feb 17, 2022, at 8:07 AM, Scott Reynolds <sdrreyno...@gmail.com>
> > wrote:
> > >
> > > Is it feasible to prevent the filter push down in cases like this (
> > detect
> > > potential division by zero) ? What other exceptions that could happen?
> > >
> > > Should it only push down some filters and preserve the complex filter
> > above
> > > the join?
> > >
> > > Regarding a Jira ticket, generating an exception when it should produce
> > > valid results is gotta be a Jira case IMHO.
> > >
> > >
> > >
> > >
> > > On Thu, Feb 17, 2022, 7:17 AM Stamatis Zampetakis <zabe...@gmail.com>
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> Till today, I believed that pushing a filter below an inner join is
> not
> > >> strictly related to the operators (equals, plus, minus, cast,
> division)
> > >> present in the filter.
> > >>
> > >> However, the query below will return some results if the filter is
> > executed
> > >> after the join or it will fail with an exception if the filter is
> pushed
> > >> below the join.
> > >>
> > >> EMP [EMPNO, NAME, DEPTNO]
> > >> 0   | Alex | 0
> > >> 10 | Bob | 1
> > >>
> > >> DEP [DEPTNO]
> > >> 1
> > >>
> > >> SELECT e.name
> > >> FROM emp e
> > >> INNER JOIN dep d ON e.deptno = d.deptno
> > >> WHERE (10 / e.empno) = 1
> > >>
> > >> It seems that SQL standard (Section 7.4 general rule 1) mandates that
> > WHERE
> > >> should be applied to the result of FROM so in the case above pushing
> > >> filters below the join seems to violate the standard.
> > >>
> > >> Currently, in Calcite we are going to push the division, cast, and
> > >> basically any kind of operator below the join. Many well-known
> databases
> > >> e.g., Postgres, do that as well.
> > >>
> > >> Should we log it as a bug and try to do something about it or let it
> be
> > as
> > >> it is?
> > >>
> > >> Best,
> > >> Stamatis
> > >>
> >
> >
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>

Re: Pushing filters with unsafe operations (CAST/DIVISION) below inner joins

Reply via email to