zzwqqq created CALCITE-7607:
-------------------------------

             Summary: DML type coercion inserts implicit narrowing casts for 
target-column assignments
                 Key: CALCITE-7607
                 URL: https://issues.apache.org/jira/browse/CALCITE-7607
             Project: Calcite
          Issue Type: Bug
          Components: core
            Reporter: zzwqqq
            Assignee: zzwqqq


For {{{}INSERT{}}}, {{{}UPDATE{}}}, and {{{}MERGE{}}}, Calcite may coerce a 
source
expression to the target column type if the two types are cast-compatible. This
happens even when the SQL does not contain an explicit {{{}CAST{}}}.

That can change DML assignment semantics. A target table may need to reject the
value, or may need to apply its own checked assignment rules. Calcite should not
turn a bare source expression into an ordinary cast unless the assignment is
safe to rewrite.

This includes length, precision, and range narrowing. For example, inserting
{{VARCHAR(64)}} into {{{}VARCHAR(16){}}}, {{DECIMAL(18, 4)}} into
{{{}DECIMAL(10, 2){}}}, or {{BIGINT}} into {{INTEGER}} should not be hidden by 
an implicit cast in the DML plan. Several databases reject those assignments 
when the actual value does not fit the target column.

Tests in postgresql, mysql and oracle:

[https://onecompiler.com/postgresql/44sf3dz68]

[https://onecompiler.com/mysql/44sf3efz4]

[https://onecompiler.com/oracle/44sf3ewmh]
h3. Steps to Reproduce

Add this case in {{{}TypeCoercionConverterTest{}}}:
{code:java}
@Test void testInsertVarcharNarrowingKeepsSourceType() {
final String sql = "insert into t1 select "
+ "CAST('AVeryLongLongStringValue' AS VARCHAR(64)), "
+ "t2_smallint, t2_int, t2_bigint,\n"
+ "t2_real, t2_double, t2_decimal, t2_timestamp, "
+ "t2_date, t2_binary, t2_boolean from t2";
sql(sql).ok();
}{code}
The plan is
{code:java}
LogicalTableModify(table=[[CATALOG, SALES, T1]], operation=[INSERT], 
flattened=[false])
  LogicalProject(t1_varchar20=['AVeryLongLongStringV':VARCHAR(20)], 
t1_smallint=[$1], t1_int=[$2], t1_bigint=[$3], t1_real=[$4], t1_double=[$5], 
t1_decimal=[$6], t1_timestamp=[$7], t1_date=[$8], t1_binary=[$9], 
t1_boolean=[$10])
    LogicalTableScan(table=[[CATALOG, SALES, T2]])
 {code}
The source expression is {{{}CAST('AVeryLongLongStringValue' AS 
VARCHAR(64)){}}},
but the plan has already truncated it to 
{{{}'AVeryLongLongStringV':VARCHAR(20){}}}.
This bypasses target-column assignment checks.

Desired behavior: the plan keeps the source type:
{code:java}
Unable to find source-code formatter for language: text. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlLogicalTableModify(table=[[CATALOG, SALES, T1]], operation=[INSERT], 
flattened=[false])
LogicalProject(t1_varchar20=['AVeryLongLongStringValue':VARCHAR(64)], 
t1_smallint=[$1], t1_int=[$2], t1_bigint=[$3], t1_real=[$4], t1_double=[$5], 
t1_decimal=[$6], t1_timestamp=[$7], t1_date=[$8], t1_binary=[$9], 
t1_boolean=[$10])
LogicalTableScan(table=[[CATALOG, SALES, T2]])
{code}
h3. Possible Fix

One possible approach is to keep store assignment separate from ordinary cast
coercion:
 * In validation, use assignment compatibility for bare DML sources, and leave
ordinary cast compatibility to user-written {{CAST}} expressions.
 * In query-source coercion, rewrite only assignments that are known to be safe
to rewrite.
 * Preserve source/target assignment metadata through {{TableModify}} so later
planning and execution stages can still see the original source type.
 * For Calcite's enumerable execution path, reject kept-source values that do 
not fit the target column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to