XiaodongHuan created SPARK-56819:
------------------------------------

             Summary: Add an option to trim trailing spaces when reading CHAR 
columns
                 Key: SPARK-56819
                 URL: https://issues.apache.org/jira/browse/SPARK-56819
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.1.1, 4.0.1
         Environment: spark-4.0.1
            Reporter: XiaodongHuan


Spark currently enforces CHAR(N) fixed-length semantics by padding CHAR values 
on write, and by applying read-side padding when spark.sql.readSideCharPadding 
is enabled. This behavior is different from MySQL, where CHAR values normally 
have trailing spaces removed on retrieval unless PAD_CHAR_TO_FULL_LENGTH is 
enabled.

This difference makes MySQL-to-Spark migration harder for workloads that rely 
on MySQL's default CHAR retrieval behavior. Users may observe different results 
for functions such as length(), concat(), comparisons in application code, or 
downstream BI/reporting queries, unless they manually wrap CHAR columns with 
rtrim() in every query.

This proposal is to add an opt-in SQL configuration that trims trailing spaces 
from CHAR(N) columns/fields when reading table data. The default should 
preserve the current Spark behavior for compatibility. The new option should 
only affect CHAR types on the read path, and should not change VARCHAR/STRING 
semantics or write-side CHAR/VARCHAR length checks.

The interaction with the existing spark.sql.readSideCharPadding option should 
be clearly defined, so users can choose between Spark's fixed-length CHAR 
behavior and MySQL-compatible CHAR retrieval behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to