zzwqqq created CALCITE-7607:
-------------------------------
Summary: DML type coercion inserts implicit narrowing casts for
target-column assignments
Key: CALCITE-7607
URL: https://issues.apache.org/jira/browse/CALCITE-7607
Project: Calcite
Issue Type: Bug
Components: core
Reporter: zzwqqq
Assignee: zzwqqq
For {{{}INSERT{}}}, {{{}UPDATE{}}}, and {{{}MERGE{}}}, Calcite may coerce a
source
expression to the target column type if the two types are cast-compatible. This
happens even when the SQL does not contain an explicit {{{}CAST{}}}.
That can change DML assignment semantics. A target table may need to reject the
value, or may need to apply its own checked assignment rules. Calcite should not
turn a bare source expression into an ordinary cast unless the assignment is
safe to rewrite.
This includes length, precision, and range narrowing. For example, inserting
{{VARCHAR(64)}} into {{{}VARCHAR(16){}}}, {{DECIMAL(18, 4)}} into
{{{}DECIMAL(10, 2){}}}, or {{BIGINT}} into {{INTEGER}} should not be hidden by
an implicit cast in the DML plan. Several databases reject those assignments
when the actual value does not fit the target column.
Tests in postgresql, mysql and oracle:
[https://onecompiler.com/postgresql/44sf3dz68]
[https://onecompiler.com/mysql/44sf3efz4]
[https://onecompiler.com/oracle/44sf3ewmh]
h3. Steps to Reproduce
Add this case in {{{}TypeCoercionConverterTest{}}}:
{code:java}
@Test void testInsertVarcharNarrowingKeepsSourceType() {
final String sql = "insert into t1 select "
+ "CAST('AVeryLongLongStringValue' AS VARCHAR(64)), "
+ "t2_smallint, t2_int, t2_bigint,\n"
+ "t2_real, t2_double, t2_decimal, t2_timestamp, "
+ "t2_date, t2_binary, t2_boolean from t2";
sql(sql).ok();
}{code}
The plan is
{code:java}
LogicalTableModify(table=[[CATALOG, SALES, T1]], operation=[INSERT],
flattened=[false])
LogicalProject(t1_varchar20=['AVeryLongLongStringV':VARCHAR(20)],
t1_smallint=[$1], t1_int=[$2], t1_bigint=[$3], t1_real=[$4], t1_double=[$5],
t1_decimal=[$6], t1_timestamp=[$7], t1_date=[$8], t1_binary=[$9],
t1_boolean=[$10])
LogicalTableScan(table=[[CATALOG, SALES, T2]])
{code}
The source expression is {{{}CAST('AVeryLongLongStringValue' AS
VARCHAR(64)){}}},
but the plan has already truncated it to
{{{}'AVeryLongLongStringV':VARCHAR(20){}}}.
This bypasses target-column assignment checks.
Desired behavior: the plan keeps the source type:
{code:java}
Unable to find source-code formatter for language: text. Available languages
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go,
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl,
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml,
yamlLogicalTableModify(table=[[CATALOG, SALES, T1]], operation=[INSERT],
flattened=[false])
LogicalProject(t1_varchar20=['AVeryLongLongStringValue':VARCHAR(64)],
t1_smallint=[$1], t1_int=[$2], t1_bigint=[$3], t1_real=[$4], t1_double=[$5],
t1_decimal=[$6], t1_timestamp=[$7], t1_date=[$8], t1_binary=[$9],
t1_boolean=[$10])
LogicalTableScan(table=[[CATALOG, SALES, T2]])
{code}
h3. Possible Fix
One possible approach is to keep store assignment separate from ordinary cast
coercion:
* In validation, use assignment compatibility for bare DML sources, and leave
ordinary cast compatibility to user-written {{CAST}} expressions.
* In query-source coercion, rewrite only assignments that are known to be safe
to rewrite.
* Preserve source/target assignment metadata through {{TableModify}} so later
planning and execution stages can still see the original source type.
* For Calcite's enumerable execution path, reject kept-source values that do
not fit the target column.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)