[
https://issues.apache.org/jira/browse/CALCITE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150092#comment-17150092
]
Dawid Wysakowicz commented on CALCITE-4085:
-------------------------------------------
I'd like to split my comment into two parts. First I'd like to put out what is
the minimal set of changes that would help us in Flink. Of course it is only my
perspective, and I do understand it might not be in line with Calcite's
goals/roadmap. In the second part I'd like to reply to the general problem how
the nullability of attributes is handled as I find it interesting. Still just
to emphasize it is not my goal to change the behaviour, even though personally
I'd find it helpful.
1.
In Flink we do want to preserve the nullability of attributes. Similarly to
Calcite's {{JavaType}}. We find that behaviour more intuitive and powerful. We
can use that for mapping to java types in e.g. UDFs, but also provide a more
fine grained constraints on Tables. We do not have problems with creating such
a type. The {{RelRecordType}} ctor is public after all. However we find the
accessor methods do not work with such types. The accessor method, as described
in the SQL standard, should return NULL on NULL input. Input in this case is
the structured type itself. If I am not wrong it is the
{{SqlTypeTransforms.TO_NULLABLE}} that implements that logic.
I disagree with the satement from [~danny0405]:
??my main concern is moving the nullability from type factory to all kinds of
sql contexts make the logic dispersed and fragile, it is hard to keep it right.
And it does not have good sustainability when new sql contexts are introduced,
i.e. new syntax or operators.??
The “RETURNS NULL ON NULL INPUT” property is a characteristic of a function,
not a type, and is defined by the SQL standard:
??A null-call function is an SQL-invoked function that is defined to return the
null value if any of its input arguments is the null value. A null-call
function is an SQL-invoked function whose <null-call clause> specifies “RETURNS
NULL ON NULL INPUT”.??
BTW you do not change the nullability of elements of ARRAY, MAP, MULTISET in
the factory, because the accessors return NULLABLE types. In order to achieve
the point one I identified a few locations which are problematic for us. I also
do not like that there is so many locations with that logic, but that's where
the "accessor" method is implemented.
* SqlDotOperator#deriveType -> accessor method for structured types that come
from an expression e.g. SELECT udf_returns_structured_type().field0;
* SqlItemOperator#inferReturnType -> accessor method for structured types with
an item notation e.g. SELECT udf_returns_structured_type()["field0"];
* AliasNamespace#validateImpl -> here the namespace drops the nullability (it
is a different problem from the rest)
* RexBuilder#makeFieldAccessInternal and
SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) -> accessor method for
structured types that come from table e.g. SELECT t.field0 FROM testTable t;
If you disagree with the changes I proposed in the PR, can you suggest a way
for us to implement that logic without copying Calcite classes (now we need to
copy over SqlDotOperator, SqlItemOperator, AliasNamespace).
2.
As for the general problem of attributes nullability. First of all thank you
[~zabetak] for the references.
The fragment that I find most applicable to our problem is:
Structured types can be used as the data types of variables, arguments and
parameters, and so forth, so permitting constraints to be put on their
attributes would violate that aspect of SQL's design. Instead, the designers
concluded that constraints on some column whose data type is some structured
type could be used to control the values allowed in the type's attributes, but
only for instances stored in that column.
Unfortunately it is not very definitive in my opinion. It would rather give me
the impression that you can put a NOT NULL constraint on an attribute by
putting it on the column. That's not how e.g. Oracle implements it. You can
e.g. do:
{code}
CREATE TYPE emp_det IS OBJECT
(
EMP_NO VARCHAR2(2),
EMP_NAME VARCHAR2(150),
MANAGER NUMBER,
SALARY NUMBER
);
CREATE TABLE testTable (
f0 emp_det NOT NULL,
abc NUMBER
);
INSERT INTO testTable (f0, abc) VALUES (null, 1); --> throws an exception, as
it validates the constraint
INSERT INTO testTable (f0, abc) VALUES (emp_det(null, null, null, null), 1); ->
works just fine
{code}
The fragment from CALCITE-2464 that provides a table that describes behaviour
of IS NULL/IS NOT NULL describes behaviour of row value predicand. It is not
equivalent to structure type. Therefore I don't think it applies to the problem
of structured types attributes. It can be e.g. a field reference:
{code}
<row value predicand> ::=
<row value special case>
| <row value constructor predicand>
<row value special case> ::=
<nonparenthesized value expression primary>
<nonparenthesized value expression primary> ::=
<unsigned value specification>
| <column reference>
| <set function specification>
| <window function>
| <nested window function>
| <scalar subquery>
| <case expression>
| <cast specification>
| <field reference>
| <subtype treatment>
| <method invocation>
| <static method invocation>
| <new specification>
| <attribute or method reference>
| <reference resolution>
| <collection value constructor>
| <array element reference>
| <multiset element reference>
| <next value expression>
| <routine invocation>
| <row pattern navigation operation>
{code}
I also think the current behaviour will not work in a following scenario:
{code}
CREATE TYPE emp_det IS OBJECT
(
EMP_NO VARCHAR2(2) NOT NULL,
EMP_NAME VARCHAR2(150),
MANAGER NUMBER NOT NULL,
SALARY NUMBER
);
CREATE TABLE testTableNotNull (
f0 emp_det NOT NULL
);
CREATE TABLE testTableNullable (
f0 emp_det
);
-- as far as I see it, this should be a valid query, but I think it will fail
because the attributes types will be different
-- saying I think, because I am not entirely sure how the validation of DML
statements work
INSERT INTO testTableNotNull SELECT * FROM testTableNullable WHERE f0 IS NOT
NULL;
{code}
As for the backwards compatibility. The current behaviour was introduced with
[CALCITE-2464]. It was not present before. Honestly I don't know how many
downstream projects use the structured types and depend on the current
behaviour. I can speak only for Flink project and there as you can see we find
the current behaviour problematic.
> Improve nullability support for fields of structured type
> ---------------------------------------------------------
>
> Key: CALCITE-4085
> URL: https://issues.apache.org/jira/browse/CALCITE-4085
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Dawid Wysakowicz
> Assignee: Dawid Wysakowicz
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> As discussed in
> https://lists.apache.org/thread.html/r602ac95fff23dd1ef974fb396df7be061ab861384ec42f5c57ce0bc2%40%3Cdev.calcite.apache.org%3E
> I would like to change the way a type of a field of a record is derived at
> couple of locations. This helps frameworks such as Apache Flink to build
> support for nullable records with not null fields.
> I suggest to change:
> * SqlDotOperator#deriveType
> * SqlItemOperator#inferReturnType
> * AliasNamespace#validateImpl
> * RexBuilder#makeFieldAccessInternal
> * SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)