In my opinion, your propositions look like an improvement, so please go ahead, open an issue and submit a patch. We can discuss the details in the Jira / PR (hopefully other people will be involved in the discussion too).
Best regards, Ruben Le ven. 19 juin 2020 à 09:45, Dawid Wysakowicz <[email protected]> a écrit : > Thank you for the reply Ruben, > > I understand the Calcite stand on the nullability. We should be fine > having the custom logic in our factory. > > Happy to hear you are willing to accept some of the changes. Your > summary is pretty good. Nevetheless I thought I'd extend it a little bit > and also categorize it in two groups. > > 1. Changes that require copying Calcite classes (this is the group, we > would like to contribute the most) > > - SqlDotOperator#deriveType > > - SqlItemOperator#inferReturnType (yes, I made a mistake with the > methods in the original email) > > Those two, as you said implement the logic that is already in > SqlDotOperator#inferReturnType > > - AliasNamespace#validateImpl - here the change is about forwading > the original StructKind and nullability. Right now AliasNamespace always > strips those and produces NOT NULL FULLY_QUALIFIED RelRecordType. > > 2. Changes that we can make by extending Calcite classes. It would be > nice to have those also in Calcite, but we would be fine having it in > Flink only. Even if not perfect, they go through extension points. > > - RexBuilder#makeFieldAccessInternal -> in the attached commit it is > implemented in FlinkRexBuilder, this is the same logic as in > SqlDotOperator#deriveType/inferReturnType, SqlItemOperator#inferReturnType > > - SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) -> in the > attached commit it is implemented in FlinkSqlNameMatcher(which honestly > is quite sketchy), again we would greatly appreciate if we could add the > logic from SqlDotOperator#deriveType/inferReturnType, > SqlItemOperator#inferReturnType to that method. > > If you are fine with it. I'll open an issue and submit a patch in the > next few days. > > Best, > > Dawid > > > On 18/06/2020 14:37, Ruben Q L wrote: > > Hi Dawid, > > > > thanks for starting this discussion. > > > > The reason behind the current behavior of RelDataTypeFactoryImpl# > > createTypeWithNullability (*"the way this method works right now is that > it > > changes all the nested fields to nullable if the the structured type is > > nullable"*) is because during the discussion for CALCITE-2464 we decided > > this was the best way to represent SQL standard (*"According to the SQL > > standard, nullability for struct types can be defined only for columns, > > which translates to top level structs. Nested struct attributes are > always > > nullable, so in principle we could always set the nested attributes to be > > nullable"*). > > > > You say that this does not work for your case, but that you were able to > > workaround the issue with your own version of TypeFactory. I am not sure > if > > this is the best solution for this problem, but at least it is a > solution. > > > > Regarding the changes that you propose, if I am not mistaken they could > be > > summarized as: > > - SqlDotOperator#deriveType > > - SqlItemOperator#inferReturnType (your email mentions deriveType, but > the > > diff shows me it is actually inferReturnType) > > - AliasNamespace#validateImpl > > > > At first glance, I think those changes are fair, and could be integrated > as > > a contribution. As I see it, you are applying in different methods the > > logic that was already in place for SqlDotOperator#inferReturnType. > > > > Best regards, > > Ruben > > > > > > Le jeu. 18 juin 2020 à 12:29, Dawid Wysakowicz <[email protected]> > a > > écrit : > > > >> Hi devs, > >> > >> I wanted to reach out to the Apache Calcite community about an issue > with > >> nullability of structured types we observed in Apache Flink. > >> > >> First of all let me give you a quick introduction why it is important > for > >> us. What we want to support eventually is that we want to be able to map > >> POJOs to structured types. > >> > >> For example, having a POJO like > >> class Address { > >> public String street, > >> public int zip, // important it is a primitive int > >> public String city); > >> } > >> > >> we want to map it a type > >> > >> CREATE TYPE Address AS ( > >> street VARCHAR(20), > >> zip INT NOT NULL, > >> city VARCHAR(20) NOT NULL); > >> > >> and with that type we want to create a table > >> > >> CREATE TABLE Contacts ( > >> name VARCHAR(20) NOT NULL, > >> address Address); -- Address structured type is nullable > >> > >> Unfortunately as described in CALCITE-2464[1], the way it works is that > >> first a not null address type is created which is further converted to > >> nullable via TypeFactor#createTypeWithNullability. Unfortunately the way > >> this method works right now is that it changes all the nested fields to > >> nullable if the the strucuted type is nullable. Therefore we can no > longer > >> map this type to our original pojo, because the zip field has nullable > INT > >> type now. Which in turn makes it impossible to pass that pojo as a > >> parameter to a UDF: > >> > >> class UDF extends ScalarFunction { > >> public Object eval(Address address) { > >> if (address != null) { > >> // based on the type I know address.id is NOT NULL here > >> int id = address.id; > >> } > >> } > >> } > >> > >> We were able to circumvent that somewhat in our version of the > >> TypeFactory, but this causes problems when accessing fields of such > type. > >> In most of the places in Calcite as far as I can tell the type of an > >> accessed nested field will simply be RelDataType#getField#getType, but > this > >> does not work if the outer type is nullable. Take the above example: > >> > >> SELECT address.zip FROM Contacts; > >> > >> This query should have a column of type INT instead of INT NOT NULL, > >> because if the column itself (the address) is null we can not produce a > not > >> null zip. > >> > >> I was able to make it work with few changes to Calcite classes[2]: > >> > >> * SqlDotOperator - I had to change the deriveType method, this class has > >> the needed logic in the inferReturnType > >> > >> * SqlItemOperator - the same change in deriveType > >> > >> * AliasNamespace - this class is stripping down the original nullability > >> > >> * FlinkRexBuilder, SqlNameMatcher - those classes are used when > converting > >> SqlIdentifier in a query like SELECT address.zip FROM Contacts; > >> > >> > >> Here come my questions: > >> > >> 1. Do you have a suggestion/better way to handle the problem > >> > >> 2. Would you be willing to accept a contribution of the changes > mentioned > >> above? (I was thinking of the first three classes). We would like to use > >> vanilla Calcite as much as possible. > >> > >> > >> I am happy to answer any questions, if I described something not clear > >> enough. I am also looking forward to any comments/suggestions. > >> > >> Best, > >> > >> Dawid > >> > >> [1] https://issues.apache.org/jira/browse/CALCITE-2464 > >> > >> [2] > >> > https://github.com/apache/flink/pull/12649/commits/1c33fe9cb5491f47fb19c16f32de5e6aef5089d3 > >> > >
