ggjh-159 opened a new issue, #12249:
URL: https://github.com/apache/gluten/issues/12249

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   # Bug: VARCHAR literal value includes charset prefix `_UTF-16LE'...'` in 
output
   
   ## Problem
   
   When a SQL query contains VARCHAR string literals (e.g., `CASE WHEN ... THEN 
'nightTime'`), the
   output produced by Gluten-Flink includes an unexpected charset prefix 
`_UTF-16LE'...'` around the
   string value. For example, `'nightTime'` is rendered as 
`_UTF-16LE'nightTime'` in the Print
   connector output.
   
   Native Flink (without Gluten) outputs the correct plain string value 
`nightTime`.
   
   ## Root Cause
   
   In `RexNodeConverter.toVariant()`, the VARCHAR branch calls 
`literal.getValue().toString()`. For
   VARCHAR type, `RexLiteral.getValue()` returns a Calcite `NlsString` object. 
`NlsString.toString()`
   produces the SQL literal format with charset prefix: `_UTF-16LE'value'`, not 
the raw string value.
   
   The CHAR branch in the same method uses the correct API: 
`literal.getValueAs(String.class)`, which
   returns the plain string value.
   
   ## File
   
   - 
`gluten-flink/planner/src/main/java/org/apache/gluten/rexnode/RexNodeConverter.java`,
 line 113
   
   ## Verification: Native Flink does NOT have this issue
   
   Tested the same q14 query on native Flink (without Gluten JARs), the output 
is correct with no
   charset prefix:
   
   ```
   +I[1000, 2001, 26353920.936, nightTime, 2026-06-05T05:55:55.858, ..., 0]
   ```
   
   This confirms the `_UTF-16LE` prefix is introduced by Gluten's planner, not 
by Flink itself.
   
   ## Reproduce
   
   Run any nexmark query with VARCHAR string literals in CASE WHEN expressions, 
e.g., nexmark q14:
   
   ```sql
   SELECT
       auction, bidder, 0.908 * price as price,
       CASE
           WHEN HOUR(dateTime) >= 8 AND HOUR(dateTime) <= 18 THEN 'dayTime'
           WHEN HOUR(dateTime) <= 6 OR HOUR(dateTime) >= 20 THEN 'nightTime'
           ELSE 'otherTime'
       END AS bidTimeType,
       dateTime, extra,
       count_char(extra, 'c') AS c_counts
   FROM bid
   WHERE 0.908 * price > 1000000 AND 0.908 * price < 50000000
   ```
   
   ### Actual output (Gluten-Flink)
   
   ```
   +I[1012, 2001, 28428278.716, _UTF-16LE'nightTime', 2026-06-05T02:24:06.630, 
..., 0]
   ```
   
   ### Expected output (native Flink)
   
   ```
   +I[1012, 2001, 28428278.716, nightTime, 2026-06-05T02:24:06.630, ..., 0]
   ```
   
   ## Environment
   
   - Gluten-Flink: 1.7.0-SNAPSHOT
   - Flink: 1.19.2
   
   ### Gluten version
   
   _No response_
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to