byteink commented on issue #6492:
URL: 
https://github.com/apache/arrow-datafusion/issues/6492#issuecomment-1576183654

   I think there may be two issues that need to be considered.
   
   # 1. What is the schema of a table created with initial contents 
(particularly the behavior of nullability)
   ## MySQL
   Query expressions and column definitions can appear together in the create 
table statement, and MySQL will check if there is a conflict between the two.
   ```
   mysql> create table t1(a int not null) as select 1 as a;
   Query OK, 1 row affected (0.06 sec)
   Records: 1  Duplicates: 0  Warnings: 0
   
   mysql> describe t1;
   +-------+------+------+-----+---------+-------+
   | Field | Type | Null | Key | Default | Extra |
   +-------+------+------+-----+---------+-------+
   | a     | int  | NO   |     | NULL    |       |
   +-------+------+------+-----+---------+-------+
   1 row in set (0.00 sec)
   
   
   mysql> create table t2(a int not null) as select null as a;
   ERROR 1048 (23000): Column 'a' cannot be null
   
   ```
   
   
   ## PostgreSQL
   `SELECT INTO` and `CREATE TABLE AS` statements do not support specifying 
column constraints.
   It seems that the schema of the table created through these statements is 
always nullable.
   ```
   postgres => create table t1(a int not null);
   CREATE TABLE
   postgres => create table t2 as select * from t1;
   SELECT 0
   postgres => \d t2
                    Table "public.t2"
    Column |  Type   | Collation | Nullable | Default
   --------+---------+-----------+----------+---------
    a      | integer |           |          |
   
   postgres=> select * into t3 from t1;
   SELECT 0
   postgres=> \d t3
                    Table "public.t3"
    Column |  Type   | Collation | Nullable | Default
   --------+---------+-----------+----------+---------
    a      | integer |           |          |
   ```
   
   
   ## Datafusion
   > Now that we an add new data to MemTables, what do you think about ensuring 
the schema of the MemTable when created via INSERT ... is nullable?
   
   I think it is ok, just like PostgreSQL.
   
   But we should reject the following statement:
   ```
   DataFusion CLI v25.0.0
   ❯ create table t(a int not null) as select null as a;
   0 rows in set. Query took 0.005 seconds.
   ❯ select * from t;
   +---+
   | a |
   +---+
   |   |
   +---+
   1 row in set. Query took 0.005 seconds.
   ```
   The user specified that the column should not be nullable, but this 
specification did not take effect. 
   
   # 2. What kind of data can be inserted into MemoryTable
   I think using the comparison of whether the two schemas are the same may be 
too strict.
   Declaring the schema as nullable doesn't necessarily mean that the actual 
data is always NULL.
   
   The following example can run successfully in PostgreSQL, but not in 
Datafusion.
   ```
   postgres=> create table foo(a int, b int);
   CREATE TABLE
   postgres=> insert into foo values(1,10), (2, NULL);
   INSERT 0 2
   postgres=> create table bar(b int not null);
   CREATE TABLE
   
   postgres=> insert into bar select b from foo where a=1;
   INSERT 0 1
   postgres=> insert into bar select b from foo where a=2;
   ERROR:  null value in column "b" of relation "bar" violates not-null 
constraint
   DETAIL:  Failing row contains (null).
   ```
   We should verify the resulting data after executing the input's execution 
plan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to