[PR] [AURON #1680] initCap semantics are aligned with Spark [auron]

via GitHub Sun, 30 Nov 2025 16:34:26 -0800


yew1eb opened a new pull request, #1681:
URL: https://github.com/apache/auron/pull/1681


    
   
   # Which issue does this PR close?
   
    
   
   Closes #1680 .
   
    # Rationale for this change
    The current initcap implementation uses DataFusion's initcap, which does 
not match Spark's semantics. Spark uses space-only word boundaries and 
title-cases the first letter while lowercasing the rest.  
   
   # What changes are included in this PR?
    + Implement a new initcap native function aligned with Spark, similar to 
Spark's implementation logic: ` 
string.asInstanceOf[UTF8String].toLowerCase.toTitleCase`.
   
   # Are there any user-facing changes?
    Yes. initcap results will now match Spark's semantics.  
    
   
   # How was this patch tested?
    Added unit tests covering ASCII/non-ASCII, punctuation, space-only 
boundaries, and edge cases.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [AURON #1680] initCap semantics are aligned with Spark [auron]

Reply via email to