[jira] [Commented] (CALCITE-4085) Improve nullability support for fields of structured type

Dawid Wysakowicz (Jira) Thu, 02 Jul 2020 02:08:08 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150092#comment-17150092
 ]


Dawid Wysakowicz commented on CALCITE-4085:
-------------------------------------------

I'd like to split my comment into two parts. First I'd like to put out what is 
the minimal set of changes that would help us in Flink. Of course it is only my 
perspective, and I do understand it might not be in line with Calcite's 
goals/roadmap.  In the second part I'd like to reply to the general problem how 
the nullability of attributes is handled as I find it interesting. Still just 
to emphasize it is not my goal to change the behaviour, even though personally 
I'd find it helpful.

1.

In Flink we do want to preserve the nullability of attributes. Similarly to 
Calcite's {{JavaType}}. We find that behaviour more intuitive and powerful. We 
can use that for mapping to  java types in e.g. UDFs, but also provide a more 
fine grained constraints on Tables. We do not have problems with creating such 
a type. The {{RelRecordType}} ctor is public after all. However we find the 
accessor methods do not work with such types. The accessor method, as described 
in the SQL standard, should return NULL on NULL input. Input in this case is 
the structured type itself. If I am not wrong it is the 
{{SqlTypeTransforms.TO_NULLABLE}} that implements that logic. 

I disagree with the satement from [~danny0405]:

??my main concern is moving the nullability from type factory to all kinds of 
sql contexts make the logic dispersed and fragile, it is hard to keep it right. 
And it does not have good sustainability when new sql contexts are introduced, 
i.e. new syntax or operators.??

The “RETURNS NULL ON NULL INPUT” property is a characteristic of a function, 
not a type, and is defined by the SQL standard:

??A null-call function is an SQL-invoked function that is defined to return the 
null value if any of its input arguments is the null value. A null-call 
function is an SQL-invoked function whose <null-call clause> specifies “RETURNS 
NULL ON NULL INPUT”.??

BTW you do not change the nullability of elements of ARRAY, MAP, MULTISET in 
the factory, because the accessors return NULLABLE types. In order to achieve 
the point one I identified a few locations which are problematic for us. I also 
do not like that there is so many locations with that logic, but that's where 
the "accessor" method is implemented.

* SqlDotOperator#deriveType -> accessor method for structured types that come 
from an expression e.g. SELECT udf_returns_structured_type().field0;
* SqlItemOperator#inferReturnType -> accessor method for structured types with 
an item notation e.g. SELECT udf_returns_structured_type()["field0"];
* AliasNamespace#validateImpl -> here the namespace drops the nullability (it 
is a different problem from the rest)
* RexBuilder#makeFieldAccessInternal and 
SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) -> accessor method for 
structured types that come from table e.g. SELECT t.field0 FROM testTable t;

If you disagree with the changes I proposed in the PR, can you suggest a way 
for us to implement that logic without copying Calcite classes (now we need to 
copy over SqlDotOperator, SqlItemOperator, AliasNamespace).

2.

As for the general problem of attributes nullability. First of all thank you 
[~zabetak] for the references.

The fragment that I find most applicable to our problem is:

Structured types can be used as the data types of variables, arguments and 
parameters, and so forth, so permitting constraints to be put on their 
attributes would violate that aspect of SQL's design. Instead, the designers 
concluded that constraints on some column whose data type is some structured 
type could be used to control the values allowed in the type's attributes, but 
only for instances stored in that column.

Unfortunately it is not very definitive in my opinion. It would rather give me 
the impression that you can put  a NOT NULL constraint on an attribute by 
putting it on the column. That's not how e.g. Oracle implements it. You can 
e.g. do:

{code}
CREATE TYPE emp_det IS OBJECT 
( 
EMP_NO VARCHAR2(2), 
EMP_NAME VARCHAR2(150), 
MANAGER NUMBER, 
SALARY NUMBER  
);

CREATE TABLE testTable (
    f0 emp_det NOT NULL,
    abc NUMBER
);

INSERT INTO testTable (f0, abc) VALUES (null, 1); --> throws an exception, as 
it validates the constraint
INSERT INTO testTable (f0, abc) VALUES (emp_det(null, null, null, null), 1); -> 
works just fine
{code}

The fragment from CALCITE-2464 that provides a table that describes behaviour 
of IS NULL/IS NOT NULL describes behaviour of row value predicand. It is not 
equivalent to structure type. Therefore I don't think it applies to the problem 
of structured types attributes. It can be e.g. a field reference:

{code}
<row value predicand> ::=
<row value special case>
| <row value constructor predicand>

<row value special case> ::=
<nonparenthesized value expression primary>

<nonparenthesized value expression primary> ::=
<unsigned value specification>
| <column reference>
| <set function specification>
| <window function>
| <nested window function>
| <scalar subquery>
| <case expression>
| <cast specification>
| <field reference>
| <subtype treatment>
| <method invocation>
| <static method invocation>
| <new specification>
| <attribute or method reference>
| <reference resolution>
| <collection value constructor>
| <array element reference>
| <multiset element reference>
| <next value expression>
| <routine invocation>
| <row pattern navigation operation>
{code}

I also think the current behaviour will not work in a following scenario:

{code}
CREATE TYPE emp_det IS OBJECT 
( 
EMP_NO VARCHAR2(2) NOT NULL, 
EMP_NAME VARCHAR2(150), 
MANAGER NUMBER NOT NULL, 
SALARY NUMBER  
);

CREATE TABLE testTableNotNull (
    f0 emp_det NOT NULL
);


CREATE TABLE testTableNullable (
    f0 emp_det
);

-- as far as I see it, this should be a valid query, but I think it will fail 
because the attributes types will be different
-- saying I think, because I am not entirely sure how the validation of DML 
statements work
INSERT INTO testTableNotNull SELECT * FROM testTableNullable WHERE f0 IS NOT 
NULL;
{code}

As for the backwards compatibility. The current behaviour was introduced with 
[CALCITE-2464]. It was not present before. Honestly I don't know how many 
downstream projects use the structured types and depend on the current 
behaviour. I can speak only for Flink project and there as you can see we find 
the current behaviour problematic.


> Improve nullability support for fields of structured type
> ---------------------------------------------------------
>
>                 Key: CALCITE-4085
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4085
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> As discussed in 
> https://lists.apache.org/thread.html/r602ac95fff23dd1ef974fb396df7be061ab861384ec42f5c57ce0bc2%40%3Cdev.calcite.apache.org%3E
>  I would like to change the way a type of a field of a record is derived at 
> couple of locations. This helps frameworks such as Apache Flink to build 
> support for nullable records with not null fields.
> I suggest to change:
> * SqlDotOperator#deriveType
> * SqlItemOperator#inferReturnType
> * AliasNamespace#validateImpl
> * RexBuilder#makeFieldAccessInternal
> * SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4085) Improve nullability support for fields of structured type

Reply via email to