github-actions[bot] commented on code in PR #62699:
URL: https://github.com/apache/doris/pull/62699#discussion_r3308352801


##########
regression-test/suites/inverted_index_p0/test_tokenize.groovy:
##########
@@ -128,4 +128,84 @@ suite("test_tokenize"){
     qt_tokenize_sql """SELECT TOKENIZE('1・2', 
'"parser"="ik","parser_mode"="ik_max_word"');"""
     qt_tokenize_sql """SELECT TOKENIZE('abcşīabc', 
'"parser"="ik","parser_mode"="ik_max_word"');"""
 
+    // Regression for the case where the first argument is a constant literal
+    // while the second argument is a non-const column. The generic
+    // PreparedFunctionImpl::default_implementation_for_constant_arguments path
+    // only fires when all arguments are constant, so the example from the bug
+    // description (`SELECT tokenize('hello', '"parser"="english"') FROM t`)
+    // never actually reaches FunctionTokenize::execute_impl. To exercise the
+    // left_const branch added by this fix we need a non-const second argument;
+    // we get that by reading the properties string from a table column.
+    //
+    // Before the fix _do_tokenize / _do_tokenize_none would produce a
+    // dest_column with only 1 row (because unpack_if_const unwrapped the
+    // ColumnConst of the first arg to its inner 1-row data column), so the
+    // outer block ended up with a row count of 1 instead of input_rows_count.
+    def constArgTbl = "tokenize_const_first_arg"
+    sql "DROP TABLE IF EXISTS ${constArgTbl}"
+    sql """
+    CREATE TABLE IF NOT EXISTS ${constArgTbl}(
+      `id` int NULL,
+      `parser_config` text NULL
+    ) ENGINE=OLAP
+    DUPLICATE KEY(`id`)
+    DISTRIBUTED BY HASH(`id`) BUCKETS 1
+    PROPERTIES("replication_allocation" = "tag.location.default: 1");
+    """
+    sql """INSERT INTO ${constArgTbl} VALUES
+           (1, '"parser"="english"'),
+           (2, '"parser"="english"'),
+           (3, '"parser"="english"'),
+           (4, '"parser"="english"'),
+           (5, '"parser"="english"');"""
+
+    // Row count must match the source table's row count, not collapse to 1.
+    def cntEnglish = sql """SELECT COUNT(*) FROM

Review Comment:
   This regression will not reach the BE path it is trying to test. 
`Tokenize.checkLegalityBeforeTypeCoercion()` rejects any second argument that 
is not a `StringLikeLiteral` with `tokenize second argument must be string 
literal`, but this query passes the `parser_config` column. That means the 
suite fails during analysis (or, if this legality were bypassed, the function 
would still only parse `col_right->get_data_at(0)` for the whole block). Please 
cover a legal query shape that actually reaches the changed `left_const` 
branch, or change the FE legality/BE semantics together if non-literal parser 
configs are intended to be supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to