github-actions[bot] commented on code in PR #62905:
URL: https://github.com/apache/doris/pull/62905#discussion_r3198903739


##########
regression-test/suites/variant_p0/doc_mode/test_variant_cast_strict.groovy:
##########
@@ -0,0 +1,106 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// INSERT VALUES with explicit cast(<json> as variant) into a doc-mode
+// variant column previously aborted BE in MutableBlock::merge_impl.
+// The root cause: FE allowed implicit Variant->Variant cast even when
+// configurations (max_subcolumns_count, enable_doc_mode, ...) differ,
+// and BE has no real Variant->Variant conversion.
+//
+// Fix: FE rejects Variant->Variant cast when configurations differ.
+// User-visible behavior:
+//   * BAD : cast('<json>' as variant) with mismatched config -> 
AnalysisException
+//   * GOOD: '<json>'                                          -> auto-coerce 
to target
+//   * GOOD: cast('<json>' as variant<...matching...>)         -> direct
+suite("test_variant_cast_strict", "p0") {
+    // Use session variables to set variant defaults (column-level properties
+    // forbid setting max_subcolumns_count and enable_doc_mode together).
+    sql """ set default_variant_enable_doc_mode = true """
+    sql """ set default_variant_max_subcolumns_count = 37 """
+    sql """ set default_variant_doc_materialization_min_rows = 8 """
+    sql """ set default_variant_doc_hash_shard_count = 7 """
+
+    def t = "variant_cast_strict"
+    sql """ DROP TABLE IF EXISTS ${t} """
+    sql """
+        CREATE TABLE IF NOT EXISTS ${t} (
+            id bigint,
+            v variant
+        )
+        DUPLICATE KEY(`id`)
+        DISTRIBUTED BY HASH(id) BUCKETS 1
+        PROPERTIES("replication_num" = "1", "disable_auto_compaction" = 
"true");
+    """
+
+    def jsonValue = 
'{"anchors":{"common_int":150025,"phase_marker":"phase_a","present":true,"row_id":15001},"dynamic":{"path_00000":15001000,"path_00001":15001001},"parent":{"child":{"name":"phase_a_15001"}},"phase_a_small":{"leaf":15001}}'
+
+    // ---- Case 1: BAD path — implicit Variant->Variant with mismatched 
config should be
+    // rejected by FE. Before fix: BE aborts in MutableBlock::merge_impl. 
After fix: FE
+    // throws AnalysisException; BE never receives the plan.
+    // We force a config mismatch on `variant_doc_materialization_min_rows` 
(target=8 from
+    // session, source=999 here).
+    test {
+        sql """ insert into ${t} values (15001, cast('${jsonValue}' as 
variant<properties(
+            "variant_enable_doc_mode" = "true",
+            "variant_doc_materialization_min_rows" = "999",
+            "variant_doc_hash_shard_count" = "7"
+        )>)); """
+        exception "cast"
+    }
+
+    // ---- Case 2: GOOD — drop the cast, let FE coerce String -> target 
Variant directly.
+    sql """ insert into ${t} values (15002, '${jsonValue}'); """
+    qt_case2 """ select id, cast(v['anchors']['row_id'] as bigint) from ${t} 
where id = 15002; """
+
+    // ---- Case 3: GOOD — explicit cast with matching parameters (from same 
session).
+    // Bare `as variant` reads session-default config and matches target 
exactly.
+    sql """ insert into ${t} values (15003, cast('${jsonValue}' as variant)); 
"""
+    qt_case3 """ select id, cast(v['anchors']['row_id'] as bigint) from ${t} 
where id = 15003; """
+
+    // ---- Case 4: cross-table — different variant configs need an explicit 
JSONB hop.
+    def t_src = "variant_cast_strict_src"
+    // Create source table with NO doc-mode by clearing session vars first, 
then restore.
+    sql """ set default_variant_enable_doc_mode = false """
+    sql """ set default_variant_max_subcolumns_count = 0 """
+    sql """ DROP TABLE IF EXISTS ${t_src} """
+    sql """
+        CREATE TABLE IF NOT EXISTS ${t_src} (
+            id bigint,
+            v variant
+        )
+        DUPLICATE KEY(`id`)
+        DISTRIBUTED BY HASH(id) BUCKETS 1
+        PROPERTIES("replication_num" = "1", "disable_auto_compaction" = 
"true");
+    """
+    sql """ insert into ${t_src} values (15004, '${jsonValue}'); """
+    // Restore session vars so target's column-level config keeps matching.
+    sql """ set default_variant_enable_doc_mode = true """
+    sql """ set default_variant_max_subcolumns_count = 37 """
+
+    // 4a: direct copy is rejected (configs differ).
+    test {
+        sql """ insert into ${t} select id, v from ${t_src}; """
+        exception "cast"
+    }
+
+    // 4b: routing through JSONB works.
+    sql """ insert into ${t} select id, cast(cast(v as JSONB) as variant) from 
${t_src}; """
+    qt_case4b """ select id, cast(v['anchors']['row_id'] as bigint) from ${t} 
where id = 15004; """
+
+    sql """ DROP TABLE IF EXISTS ${t} """
+    sql """ DROP TABLE IF EXISTS ${t_src} """

Review Comment:
   The regression-test convention in this repository is to drop tables before 
use, not after completion, so the final state is preserved for debugging failed 
runs. This suite already drops both tables before creating them, so please 
remove the cleanup drops at the end.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/InsertIntoValuesAnalyzer.java:
##########
@@ -79,7 +81,8 @@ public class InsertIntoValuesAnalyzer extends 
AbstractBatchJobExecutor {
 
                     new 
RewriteInsertIntoExpressions(ExpressionRewrite.bottomUp(
                             ConvertAggStateCast.INSTANCE,
-                            FoldConstantRuleOnFE.PATTERN_MATCH_INSTANCE

Review Comment:
   This does not actually validate casts on the batch insert path after 
`PushProjectIntoUnion`. In `BATCH_INSERT_JOBS`, `BindSink` creates the 
target-column casts in a `LogicalProject`, then `PushProjectIntoUnion` pushes 
those projected expressions into `LogicalUnion.constantExprsList` and removes 
the project. However this custom `RewriteInsertIntoExpressions` only builds 
`ProjectExpressionRewrite` and `OneRowRelationExpressionRewrite`, unlike the 
normal `ExpressionRewrite` which has `LogicalSetOperationRewrite` to visit 
`LogicalUnion` constant expressions. As a result, any casts that were pushed 
into the union constants bypass the newly added `CheckCast.INSTANCE`, so batch 
INSERT VALUES still has the same unchecked-cast hole this PR is meant to close. 
Please either run `CheckCast` before pushing the project into the union, or 
extend this insert rewrite to also validate/rewrite the union constant 
expressions, and add a regression case that exercises the batch path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to