Indeed, if we manage to use the configuration from *flink-conf.yaml* down
the stack,
it would be easy for everyone to configure a "system-wide" legacy cast
behaviour.

Best regards,
Marios

On Tue, Mar 1, 2022 at 2:52 PM Timo Walther <twal...@apache.org> wrote:

> +1
>
> Thanks for bringing up this discussion one more time Marios.
>
> I strongly support enabling the new behavior in 1.15. It definitely has
> implications on existing users, but as Seth said, thinking about the
> upcoming upgrade story we need to make sure that at least the core/basic
> operations are correct. Otherwise we will have to maintain multiple
> versions of functions with broken semantics.
>
> I since we also try to fix various issues around configuration, maybe it
> might still be possible to configure the legacy cast behavior globally
> via flink-conf.yaml. This should make the transitioning period easier in
> production.
>
> Regards,
> Timo
>
> Am 28.02.22 um 19:04 schrieb Seth Wiesman:
> > +1
> >
> > Especially as SQL upgrades are right around the corner, it makes sense to
> > get our defaults right.
> >
> > Seth
> >
> > On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <mart...@ververica.com>
> > wrote:
> >
> >> +1 for setting this option to disabled by default. I believe failures
> >> should be brought forward as soon as possible, so they can be fixed as
> fast
> >> as possible. It will also be less confusing for new users. Last but not
> >> least, I believe the impact on existing users will be minimal (since it
> can
> >> be changed by changing one flag).
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <mat...@gmail.com> wrote:
> >>
> >>> Thanks Francesco,
> >>>
> >>> The two arguments you posted, further strengthen the need to make it
> >>> DISABLED by default.
> >>>
> >>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> >>> france...@ververica.com> wrote:
> >>>
> >>>> Hi all,
> >>>> I'm +1 with what everything you said Marios.
> >>>>
> >>>> I'm gonna add another argument on top of that: the
> >> "legacy-cast-behavior"
> >>>> has also a broken type inference, leading to incorrect results or
> >> further
> >>>> errors down in the pipeline[1]. For example, take this:
> >>>>
> >>>> SELECT COALESCE(CAST('a' AS INT), 0) ...
> >>>>
> >>>> With the legacy cast behavior ENABLED, this is going to lead to the
> >> wrong
> >>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value
> >> is
> >>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and will
> >>> never
> >>>> return 0. Essentially, CAST will infer the wrong nullability leading
> to
> >>>> assigning its result to a NOT NULL type, when its value can
> effectively
> >>> be
> >>>> NULL.
> >>>>
> >>>>> You introduce a deprecated flag to help users
> >>>> using old versions of the software to smoothly transition to the new
> >>>> version, while the new users experience the new features/behavior,
> >>>> without the need to set a flag.
> >>>>
> >>>> This is IMO the major point on why we should disable the legacy cast
> >>>> behavior by default. This is even more relevant with 1.15 and the
> >> upgrade
> >>>> story, as per the problem described above, because users will now end
> >> up
> >>>> generating plans with wrong type inference, which will be hard to
> >> migrate
> >>>> in the next flink versions.
> >>>>
> >>>> FG
> >>>>
> >>>> [1] In case you're wondering why it wasn't fixed, the reason is that
> >>> fixing
> >>>> it means creating a breaking change, for details
> >>>> https://github.com/apache/flink/pull/18611#issuecomment-1028174877
> >>>>
> >>>>
> >>>> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <mat...@gmail.com>
> >>> wrote:
> >>>>> Hello all!
> >>>>>
> >>>>> I would like to bring back the discussion regarding the
> >>>>> *table.exec.legacy-cast-behaviour*
> >>>>> configuration option which we are introducing with Flink *1.15*. This
> >>>>> option provides the users
> >>>>> with the flexibility to continue using the old (incorrect, according
> >> to
> >>>> SQL
> >>>>> standards) behaviour
> >>>>> of *CAST.*
> >>>>>
> >>>>> With Flink *1.15* we have introduced a bunch of fixes, improvements
> >> and
> >>>> new
> >>>>> casting functionality
> >>>>> between types, see https://issues.apache.org/jira/browse/FLINK-24403
> >> ,
> >>>> and
> >>>>> some of them are
> >>>>> guarded behind the legacy behaviour option:
> >>>>>
> >>>>>     - Trimming and padding when casting to CHAR/VARCHAR types to
> >> respect
> >>>> the
> >>>>>     specified length
> >>>>>     - Changes for casting various types to CHAR/VARCHAR/STRING
> >>>>>     - Runtime errors for CAST no longer emit *null *as result but
> >>>> exceptions
> >>>>>     are thrown with a meaningful message for the cause, that fail the
> >>>>> pipeline. *TRY_CAST
> >>>>>     *is introduced instead, which emits *null* results instead of
> >>> throwing
> >>>>>     exceptions.
> >>>>>
> >>>>> Those changes become active if users set the
> >>>>> *table.exec.legacy-cast-behaviour
> >>>>> *option to *DISABLED*, otherwise
> >>>>> they will continue to experience the old, *erroneous*, behaviour of
> >>>> *CAST*.
> >>>>> Currently, we have set the *table.exec.legacy-cast-behaviour *option
> >>>>> to be *ENABLED
> >>>>> *by default, so if users want
> >>>>> to get the new correct behaviour, they are required to set explicitly
> >>> the
> >>>>> option to *DISABLED*. Moreover, the option
> >>>>> itself is marked as deprecated, since the plan is to be removed in
> >> the
> >>>>> future, so that the old, erroneous behaviour
> >>>>> won't be an option, and the *TRY_CAST* would be the way to go if
> >> users
> >>>>> don't want to have errors and failed pipelines,
> >>>>> but have *null*s emitted in case of runtime errors when casting.
> >>>>>
> >>>>> I would like to start a discussion and maybe ask for voting, so that
> >> we
> >>>> set
> >>>>> the *table.exec.legacy-cast-behaviour* option
> >>>>> to *DISABLED *by default, so that users that want to keep their old
> >>>>> pipelines working the same way, without changing their
> >>>>> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> >>>>>
> >>>>> My main argument for changing the default value for the option, is
> >> that
> >>>> the
> >>>>> *DISABLED* value is the one that enables
> >>>>> the *correct* behaviour for CAST which should be the default for all
> >>> new
> >>>>> users. This way, new FLINK users, or users which
> >>>>> build new pipelines, from now on would get the correct behaviour by
> >>>> default
> >>>>> without the need of changing some flag in their
> >>>>> configuration. It feels weird to me, especially for people very
> >>> familiar
> >>>>> with standard SQL, to be obliged to set some config
> >>>>> flag, to be able to get the correct behaviour for CAST. On top, users
> >>>> that
> >>>>> won't read about this option in our docs, will,
> >>>>> "blindly", experience the old incorrect behaviour for their new
> >>>> pipelines,
> >>>>> and issues that could cause the CAST to fail
> >>>>> will remain hidden from them, since *nulls* would be emitted. IMHO,
> >>> this
> >>>>> last part is also very important during the development
> >>>>> stages of an application/pipeline. Normally, developers would want to
> >>> see
> >>>>> all possible errors/scenarios during development
> >>>>> stages , in order to build a robust production system. If errors, are
> >>>>> hidden then, they can easily end up with those errors
> >>>>> in the production system which would be even harder to discover and
> >>>> debug,
> >>>>> since no exception will ever be thrown.
> >>>>> Imagine that there is a CAST which generates nulls because of runtime
> >>>>> errors, and its result is used in an aggregate function:
> >>>>>
> >>>>> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> >>>>>
> >>>>> If nulls are emitted by the CAST (let's say because some records
> >> have a
> >>>>> quoted string value for col1, then simply the AVG result
> >>>>> would be wrong and in would be extremely difficult to realise and fix
> >>> the
> >>>>> issue by simply wrapping col1 first with, for example,
> >>>>> a REGEXP_REPLACE, to get rid of the quotes.
> >>>>>
> >>>>> My second argument is that, since we have marked
> >>>>> *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> >>>> completely
> >>>>> remove it in the future, if the default value is *DISABLED*, when
> >> it's
> >>>>> removed we also make a breaking change, by changing the
> >>>>> default behaviour for all users, which is against the common software
> >>>>> practices. You introduce a deprecated flag to help users
> >>>>> using old versions of the software to smoothly transition to the new
> >>>>> version, while the new users experience the new features/behaviour,
> >>>>> without the need to set a flag. So, when in the future this flag is
> >>>>> completely removed, for those "new" users it would be completely
> >>>>> transparent. If we continue with having the default value *ENABLED*,
> >>>> those
> >>>>> new user would experience an "unnatural" breaking change
> >>>>> when this option is completely removed.
> >>>>>
> >>>>> Best,
> >>>>> Marios
> >>>>>
> >>>
> >>> --
> >>> Marios
> >>>
>
>

-- 
Marios

Reply via email to