Re: Nullability support in structured types

Ruben Q L Fri, 19 Jun 2020 02:40:16 -0700

In my opinion, your propositions look like an improvement, so please go
ahead, open an issue and submit a patch.
We can discuss the details in the Jira / PR (hopefully other people will be
involved in the discussion too).


Best regards,
Ruben


Le ven. 19 juin 2020 à 09:45, Dawid Wysakowicz <[email protected]> a
écrit :

> Thank you for the reply Ruben,
>
> I understand the Calcite stand on the nullability. We should be fine
> having the custom logic in our factory.
>
> Happy to hear you are willing to accept some of the changes. Your
> summary is pretty good. Nevetheless I thought I'd extend it a little bit
> and also categorize it in two groups.
>
> 1. Changes that require copying Calcite classes (this is the group, we
> would like to contribute the most)
>
>     - SqlDotOperator#deriveType
>
>     - SqlItemOperator#inferReturnType (yes, I made a mistake with the
> methods in the original email)
>
>     Those two, as you said implement the logic that is already in
> SqlDotOperator#inferReturnType
>
>     - AliasNamespace#validateImpl - here the change is about forwading
> the original StructKind and nullability. Right now AliasNamespace always
> strips those and produces NOT NULL FULLY_QUALIFIED RelRecordType.
>
> 2. Changes that we can make by extending Calcite classes. It would be
> nice to have those also in Calcite, but we would be fine having it in
> Flink only. Even if not perfect, they go through extension points.
>
>     - RexBuilder#makeFieldAccessInternal -> in the attached commit it is
> implemented in FlinkRexBuilder, this is the same logic as in
> SqlDotOperator#deriveType/inferReturnType, SqlItemOperator#inferReturnType
>
>     - SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) -> in the
> attached commit it is implemented in FlinkSqlNameMatcher(which honestly
> is quite sketchy), again we would greatly appreciate if we could add the
> logic from SqlDotOperator#deriveType/inferReturnType,
> SqlItemOperator#inferReturnType to that method.
>
> If you are fine with it. I'll open an issue and submit a patch in the
> next few days.
>
> Best,
>
> Dawid
>
>
> On 18/06/2020 14:37, Ruben Q L wrote:
> > Hi Dawid,
> >
> > thanks for starting this discussion.
> >
> > The reason behind the current behavior of RelDataTypeFactoryImpl#
> > createTypeWithNullability (*"the way this method works right now is that
> it
> > changes all the nested fields to nullable if the the structured type is
> > nullable"*) is because during the discussion for CALCITE-2464 we decided
> > this was the best way to represent SQL standard (*"According to the SQL
> > standard, nullability for struct types can be defined only for columns,
> > which translates to top level structs. Nested struct attributes are
> always
> > nullable, so in principle we could always set the nested attributes to be
> > nullable"*).
> >
> > You say that this does not work for your case, but that you were able to
> > workaround the issue with your own version of TypeFactory. I am not sure
> if
> > this is the best solution for this problem, but at least it is a
> solution.
> >
> > Regarding the changes that you propose, if I am not mistaken they could
> be
> > summarized as:
> > - SqlDotOperator#deriveType
> > - SqlItemOperator#inferReturnType (your email mentions deriveType, but
> the
> > diff shows me it is actually inferReturnType)
> > - AliasNamespace#validateImpl
> >
> > At first glance, I think those changes are fair, and could be integrated
> as
> > a contribution. As I see it, you are applying in different methods the
> > logic that was already in place for SqlDotOperator#inferReturnType.
> >
> > Best regards,
> > Ruben
> >
> >
> > Le jeu. 18 juin 2020 à 12:29, Dawid Wysakowicz <[email protected]>
> a
> > écrit :
> >
> >> Hi devs,
> >>
> >> I wanted to reach out to the Apache Calcite community about an issue
> with
> >> nullability of structured types we observed in Apache Flink.
> >>
> >> First of all let me give you a quick introduction why it is important
> for
> >> us. What we want to support eventually is that we want to be able to map
> >> POJOs to structured types.
> >>
> >> For example, having a POJO like
> >> class Address {
> >>   public String street,
> >>   public int zip, // important it is a primitive int
> >>   public String city);
> >> }
> >>
> >> we want to map it a type
> >>
> >> CREATE TYPE Address AS (
> >>   street VARCHAR(20),
> >>   zip INT NOT NULL,
> >>   city VARCHAR(20) NOT NULL);
> >>
> >> and with that type we want to create a table
> >>
> >> CREATE TABLE Contacts (
> >>   name VARCHAR(20) NOT NULL,
> >>   address Address); -- Address structured type is nullable
> >>
> >> Unfortunately as described in CALCITE-2464[1], the way it works is that
> >> first a not null address type is created which is further converted to
> >> nullable via TypeFactor#createTypeWithNullability. Unfortunately the way
> >> this method works right now is that it changes all the nested fields to
> >> nullable if the the strucuted type is nullable. Therefore we can no
> longer
> >> map this type to our original pojo, because the zip field has nullable
> INT
> >> type now. Which in turn makes it impossible to pass that pojo as a
> >> parameter to a UDF:
> >>
> >> class UDF extends ScalarFunction {
> >>   public Object eval(Address address) {
> >>      if (address != null) {
> >>          // based on the type I know address.id is NOT NULL here
> >>         int id = address.id;
> >>      }
> >>   }
> >> }
> >>
> >> We were able to circumvent that somewhat in our version of the
> >> TypeFactory, but this causes problems when accessing fields of such
> type.
> >> In most of the places in Calcite as far as I can tell the type of an
> >> accessed nested field will simply be RelDataType#getField#getType, but
> this
> >> does not work if the outer type is nullable. Take the above example:
> >>
> >> SELECT address.zip FROM Contacts;
> >>
> >> This query should have a column of type INT instead of INT NOT NULL,
> >> because if the column itself (the address) is null we can not produce a
> not
> >> null zip.
> >>
> >> I was able to make it work with few changes to Calcite classes[2]:
> >>
> >> * SqlDotOperator - I had to change the deriveType method, this class has
> >> the needed logic in the inferReturnType
> >>
> >> * SqlItemOperator - the same change in deriveType
> >>
> >> * AliasNamespace - this class is stripping down the original nullability
> >>
> >> * FlinkRexBuilder, SqlNameMatcher - those classes are used when
> converting
> >> SqlIdentifier in a query like SELECT address.zip FROM Contacts;
> >>
> >>
> >> Here come my questions:
> >>
> >> 1. Do you have a suggestion/better way to handle the problem
> >>
> >> 2. Would you be willing to accept a contribution of the changes
> mentioned
> >> above? (I was thinking of the first three classes). We would like to use
> >> vanilla Calcite as much as possible.
> >>
> >>
> >> I am happy to answer any questions, if I described something not clear
> >> enough. I am also looking forward to any comments/suggestions.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> [1] https://issues.apache.org/jira/browse/CALCITE-2464
> >>
> >> [2]
> >>
> https://github.com/apache/flink/pull/12649/commits/1c33fe9cb5491f47fb19c16f32de5e6aef5089d3
> >>
>
>

Re: Nullability support in structured types

Reply via email to