kosiew opened a new pull request, #22388:
URL: https://github.com/apache/datafusion/pull/22388

   ## Which issue does this PR close?
   
   Closes #22163
   
   ## Rationale for this change
   
   `ConversionSpecifier::format` in 
`datafusion/spark/src/function/string/format_string.rs` contained substantial 
duplication across integer scalar variants (`Int8` through `UInt64`). Each 
variant repeated nearly identical formatting logic for `%d`, `%x`, `%o`, `%s`, 
and `%c`.
   
   This refactor consolidates integer formatting behavior into shared helper 
paths to reduce maintenance overhead and lower the risk of drift between signed 
and unsigned integer handling, while preserving existing Spark-compatible 
behavior and null semantics.
   
   ## What changes are included in this PR?
   
   * Introduced `IntegerValue` and `IntegerFormatValue` helper enums to 
normalize signed and unsigned integer formatting behavior.
   * Added a shared `format_integer` helper on `ConversionSpecifier` to 
centralize integer conversion dispatch.
   * Consolidated `%c` formatting through `IntegerValue::to_char()` using the 
existing `signed_to_char` and `unsigned_to_char` helpers.
   * Replaced duplicated per-type integer formatting match arms with shared 
dispatch logic for:
   
     * `%d`
     * `%x`
     * `%o`
     * `%s`
     * `%c`
   * Preserved existing null handling and invalid conversion error behavior.
   * Added a table-driven regression test covering formatting equivalence 
across signed and unsigned integer widths, including `%c` and null handling.
   
   ## Are these changes tested?
   
   Yes.
   
   Added:
   
   * `test_integer_formatting_across_widths`
   
   Existing `%c` validation tests remain in place, including coverage for:
   
   * Invalid Unicode code points
   * Surrogate ranges
   * Negative values
   * Valid `%c` formatting behavior
   
   Suggested focused validation:
   
   * `cargo test -p datafusion-spark format_char`
   
   ## Are there any user-facing changes?
   
   No.
   
   This PR is a structural refactor intended to preserve existing formatting 
behavior and Spark compatibility semantics.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to