ggjh-159 opened a new issue, #12249:
URL: https://github.com/apache/gluten/issues/12249
### Backend
VL (Velox)
### Bug description
# Bug: VARCHAR literal value includes charset prefix `_UTF-16LE'...'` in
output
## Problem
When a SQL query contains VARCHAR string literals (e.g., `CASE WHEN ... THEN
'nightTime'`), the
output produced by Gluten-Flink includes an unexpected charset prefix
`_UTF-16LE'...'` around the
string value. For example, `'nightTime'` is rendered as
`_UTF-16LE'nightTime'` in the Print
connector output.
Native Flink (without Gluten) outputs the correct plain string value
`nightTime`.
## Root Cause
In `RexNodeConverter.toVariant()`, the VARCHAR branch calls
`literal.getValue().toString()`. For
VARCHAR type, `RexLiteral.getValue()` returns a Calcite `NlsString` object.
`NlsString.toString()`
produces the SQL literal format with charset prefix: `_UTF-16LE'value'`, not
the raw string value.
The CHAR branch in the same method uses the correct API:
`literal.getValueAs(String.class)`, which
returns the plain string value.
## File
-
`gluten-flink/planner/src/main/java/org/apache/gluten/rexnode/RexNodeConverter.java`,
line 113
## Verification: Native Flink does NOT have this issue
Tested the same q14 query on native Flink (without Gluten JARs), the output
is correct with no
charset prefix:
```
+I[1000, 2001, 26353920.936, nightTime, 2026-06-05T05:55:55.858, ..., 0]
```
This confirms the `_UTF-16LE` prefix is introduced by Gluten's planner, not
by Flink itself.
## Reproduce
Run any nexmark query with VARCHAR string literals in CASE WHEN expressions,
e.g., nexmark q14:
```sql
SELECT
auction, bidder, 0.908 * price as price,
CASE
WHEN HOUR(dateTime) >= 8 AND HOUR(dateTime) <= 18 THEN 'dayTime'
WHEN HOUR(dateTime) <= 6 OR HOUR(dateTime) >= 20 THEN 'nightTime'
ELSE 'otherTime'
END AS bidTimeType,
dateTime, extra,
count_char(extra, 'c') AS c_counts
FROM bid
WHERE 0.908 * price > 1000000 AND 0.908 * price < 50000000
```
### Actual output (Gluten-Flink)
```
+I[1012, 2001, 28428278.716, _UTF-16LE'nightTime', 2026-06-05T02:24:06.630,
..., 0]
```
### Expected output (native Flink)
```
+I[1012, 2001, 28428278.716, nightTime, 2026-06-05T02:24:06.630, ..., 0]
```
## Environment
- Gluten-Flink: 1.7.0-SNAPSHOT
- Flink: 1.19.2
### Gluten version
_No response_
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]