Hi Dawid, thanks for starting this discussion.
The reason behind the current behavior of RelDataTypeFactoryImpl# createTypeWithNullability (*"the way this method works right now is that it changes all the nested fields to nullable if the the structured type is nullable"*) is because during the discussion for CALCITE-2464 we decided this was the best way to represent SQL standard (*"According to the SQL standard, nullability for struct types can be defined only for columns, which translates to top level structs. Nested struct attributes are always nullable, so in principle we could always set the nested attributes to be nullable"*). You say that this does not work for your case, but that you were able to workaround the issue with your own version of TypeFactory. I am not sure if this is the best solution for this problem, but at least it is a solution. Regarding the changes that you propose, if I am not mistaken they could be summarized as: - SqlDotOperator#deriveType - SqlItemOperator#inferReturnType (your email mentions deriveType, but the diff shows me it is actually inferReturnType) - AliasNamespace#validateImpl At first glance, I think those changes are fair, and could be integrated as a contribution. As I see it, you are applying in different methods the logic that was already in place for SqlDotOperator#inferReturnType. Best regards, Ruben Le jeu. 18 juin 2020 à 12:29, Dawid Wysakowicz <[email protected]> a écrit : > Hi devs, > > I wanted to reach out to the Apache Calcite community about an issue with > nullability of structured types we observed in Apache Flink. > > First of all let me give you a quick introduction why it is important for > us. What we want to support eventually is that we want to be able to map > POJOs to structured types. > > For example, having a POJO like > class Address { > public String street, > public int zip, // important it is a primitive int > public String city); > } > > we want to map it a type > > CREATE TYPE Address AS ( > street VARCHAR(20), > zip INT NOT NULL, > city VARCHAR(20) NOT NULL); > > and with that type we want to create a table > > CREATE TABLE Contacts ( > name VARCHAR(20) NOT NULL, > address Address); -- Address structured type is nullable > > Unfortunately as described in CALCITE-2464[1], the way it works is that > first a not null address type is created which is further converted to > nullable via TypeFactor#createTypeWithNullability. Unfortunately the way > this method works right now is that it changes all the nested fields to > nullable if the the strucuted type is nullable. Therefore we can no longer > map this type to our original pojo, because the zip field has nullable INT > type now. Which in turn makes it impossible to pass that pojo as a > parameter to a UDF: > > class UDF extends ScalarFunction { > public Object eval(Address address) { > if (address != null) { > // based on the type I know address.id is NOT NULL here > int id = address.id; > } > } > } > > We were able to circumvent that somewhat in our version of the > TypeFactory, but this causes problems when accessing fields of such type. > In most of the places in Calcite as far as I can tell the type of an > accessed nested field will simply be RelDataType#getField#getType, but this > does not work if the outer type is nullable. Take the above example: > > SELECT address.zip FROM Contacts; > > This query should have a column of type INT instead of INT NOT NULL, > because if the column itself (the address) is null we can not produce a not > null zip. > > I was able to make it work with few changes to Calcite classes[2]: > > * SqlDotOperator - I had to change the deriveType method, this class has > the needed logic in the inferReturnType > > * SqlItemOperator - the same change in deriveType > > * AliasNamespace - this class is stripping down the original nullability > > * FlinkRexBuilder, SqlNameMatcher - those classes are used when converting > SqlIdentifier in a query like SELECT address.zip FROM Contacts; > > > Here come my questions: > > 1. Do you have a suggestion/better way to handle the problem > > 2. Would you be willing to accept a contribution of the changes mentioned > above? (I was thinking of the first three classes). We would like to use > vanilla Calcite as much as possible. > > > I am happy to answer any questions, if I described something not clear > enough. I am also looking forward to any comments/suggestions. > > Best, > > Dawid > > [1] https://issues.apache.org/jira/browse/CALCITE-2464 > > [2] > https://github.com/apache/flink/pull/12649/commits/1c33fe9cb5491f47fb19c16f32de5e6aef5089d3 >
