Thank you for the reply Ruben, I understand the Calcite stand on the nullability. We should be fine having the custom logic in our factory.
Happy to hear you are willing to accept some of the changes. Your summary is pretty good. Nevetheless I thought I'd extend it a little bit and also categorize it in two groups. 1. Changes that require copying Calcite classes (this is the group, we would like to contribute the most) - SqlDotOperator#deriveType - SqlItemOperator#inferReturnType (yes, I made a mistake with the methods in the original email) Those two, as you said implement the logic that is already in SqlDotOperator#inferReturnType - AliasNamespace#validateImpl - here the change is about forwading the original StructKind and nullability. Right now AliasNamespace always strips those and produces NOT NULL FULLY_QUALIFIED RelRecordType. 2. Changes that we can make by extending Calcite classes. It would be nice to have those also in Calcite, but we would be fine having it in Flink only. Even if not perfect, they go through extension points. - RexBuilder#makeFieldAccessInternal -> in the attached commit it is implemented in FlinkRexBuilder, this is the same logic as in SqlDotOperator#deriveType/inferReturnType, SqlItemOperator#inferReturnType - SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) -> in the attached commit it is implemented in FlinkSqlNameMatcher(which honestly is quite sketchy), again we would greatly appreciate if we could add the logic from SqlDotOperator#deriveType/inferReturnType, SqlItemOperator#inferReturnType to that method. If you are fine with it. I'll open an issue and submit a patch in the next few days. Best, Dawid On 18/06/2020 14:37, Ruben Q L wrote: > Hi Dawid, > > thanks for starting this discussion. > > The reason behind the current behavior of RelDataTypeFactoryImpl# > createTypeWithNullability (*"the way this method works right now is that it > changes all the nested fields to nullable if the the structured type is > nullable"*) is because during the discussion for CALCITE-2464 we decided > this was the best way to represent SQL standard (*"According to the SQL > standard, nullability for struct types can be defined only for columns, > which translates to top level structs. Nested struct attributes are always > nullable, so in principle we could always set the nested attributes to be > nullable"*). > > You say that this does not work for your case, but that you were able to > workaround the issue with your own version of TypeFactory. I am not sure if > this is the best solution for this problem, but at least it is a solution. > > Regarding the changes that you propose, if I am not mistaken they could be > summarized as: > - SqlDotOperator#deriveType > - SqlItemOperator#inferReturnType (your email mentions deriveType, but the > diff shows me it is actually inferReturnType) > - AliasNamespace#validateImpl > > At first glance, I think those changes are fair, and could be integrated as > a contribution. As I see it, you are applying in different methods the > logic that was already in place for SqlDotOperator#inferReturnType. > > Best regards, > Ruben > > > Le jeu. 18 juin 2020 à 12:29, Dawid Wysakowicz <[email protected]> a > écrit : > >> Hi devs, >> >> I wanted to reach out to the Apache Calcite community about an issue with >> nullability of structured types we observed in Apache Flink. >> >> First of all let me give you a quick introduction why it is important for >> us. What we want to support eventually is that we want to be able to map >> POJOs to structured types. >> >> For example, having a POJO like >> class Address { >> public String street, >> public int zip, // important it is a primitive int >> public String city); >> } >> >> we want to map it a type >> >> CREATE TYPE Address AS ( >> street VARCHAR(20), >> zip INT NOT NULL, >> city VARCHAR(20) NOT NULL); >> >> and with that type we want to create a table >> >> CREATE TABLE Contacts ( >> name VARCHAR(20) NOT NULL, >> address Address); -- Address structured type is nullable >> >> Unfortunately as described in CALCITE-2464[1], the way it works is that >> first a not null address type is created which is further converted to >> nullable via TypeFactor#createTypeWithNullability. Unfortunately the way >> this method works right now is that it changes all the nested fields to >> nullable if the the strucuted type is nullable. Therefore we can no longer >> map this type to our original pojo, because the zip field has nullable INT >> type now. Which in turn makes it impossible to pass that pojo as a >> parameter to a UDF: >> >> class UDF extends ScalarFunction { >> public Object eval(Address address) { >> if (address != null) { >> // based on the type I know address.id is NOT NULL here >> int id = address.id; >> } >> } >> } >> >> We were able to circumvent that somewhat in our version of the >> TypeFactory, but this causes problems when accessing fields of such type. >> In most of the places in Calcite as far as I can tell the type of an >> accessed nested field will simply be RelDataType#getField#getType, but this >> does not work if the outer type is nullable. Take the above example: >> >> SELECT address.zip FROM Contacts; >> >> This query should have a column of type INT instead of INT NOT NULL, >> because if the column itself (the address) is null we can not produce a not >> null zip. >> >> I was able to make it work with few changes to Calcite classes[2]: >> >> * SqlDotOperator - I had to change the deriveType method, this class has >> the needed logic in the inferReturnType >> >> * SqlItemOperator - the same change in deriveType >> >> * AliasNamespace - this class is stripping down the original nullability >> >> * FlinkRexBuilder, SqlNameMatcher - those classes are used when converting >> SqlIdentifier in a query like SELECT address.zip FROM Contacts; >> >> >> Here come my questions: >> >> 1. Do you have a suggestion/better way to handle the problem >> >> 2. Would you be willing to accept a contribution of the changes mentioned >> above? (I was thinking of the first three classes). We would like to use >> vanilla Calcite as much as possible. >> >> >> I am happy to answer any questions, if I described something not clear >> enough. I am also looking forward to any comments/suggestions. >> >> Best, >> >> Dawid >> >> [1] https://issues.apache.org/jira/browse/CALCITE-2464 >> >> [2] >> https://github.com/apache/flink/pull/12649/commits/1c33fe9cb5491f47fb19c16f32de5e6aef5089d3 >>
signature.asc
Description: OpenPGP digital signature
