XiaodongHuan created SPARK-56819:
------------------------------------
Summary: Add an option to trim trailing spaces when reading CHAR
columns
Key: SPARK-56819
URL: https://issues.apache.org/jira/browse/SPARK-56819
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.1.1, 4.0.1
Environment: spark-4.0.1
Reporter: XiaodongHuan
Spark currently enforces CHAR(N) fixed-length semantics by padding CHAR values
on write, and by applying read-side padding when spark.sql.readSideCharPadding
is enabled. This behavior is different from MySQL, where CHAR values normally
have trailing spaces removed on retrieval unless PAD_CHAR_TO_FULL_LENGTH is
enabled.
This difference makes MySQL-to-Spark migration harder for workloads that rely
on MySQL's default CHAR retrieval behavior. Users may observe different results
for functions such as length(), concat(), comparisons in application code, or
downstream BI/reporting queries, unless they manually wrap CHAR columns with
rtrim() in every query.
This proposal is to add an opt-in SQL configuration that trims trailing spaces
from CHAR(N) columns/fields when reading table data. The default should
preserve the current Spark behavior for compatibility. The new option should
only affect CHAR types on the read path, and should not change VARCHAR/STRING
semantics or write-side CHAR/VARCHAR length checks.
The interaction with the existing spark.sql.readSideCharPadding option should
be clearly defined, so users can choose between Spark's fixed-length CHAR
behavior and MySQL-compatible CHAR retrieval behavior.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]