hhhizzz opened a new pull request, #1317:
URL: https://github.com/apache/auron/pull/1317

   <!--
   Thanks for sending a pull request!  Please keep the following tips in mind:
     - Start the PR title with the related issue ID, e.g. '[AURON-XXXX] Short 
summary...'.
     - Make your PR title clear and descriptive, summarizing what this PR 
changes.
     - Provide a concise example to reproduce the issue, if possible.
     - Keep the PR description up to date with all changes.
   -->
   
   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   Closes #1316.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in 
the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your 
changes and offer better suggestions for fixes.  
   -->
     - String inputs that include padding currently fail during cast; Spark 
tolerates whitespace, so we need parity in Auron’s extensions.
     - Aligning the float and integer cast paths avoids surprising errors for 
users sending trimmed-but-not-cleaned values from upstream systems.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
     - Route Utf8→float casts through a custom helper that trims whitespace 
before parsing and preserves support for scientific notation. Spark Ref: 
https://github.com/apache/spark/blob/589141e3085a2cb875cfda8530b800fc53ac019f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1108
     - Reuse the Spark-inspired integer parser while teaching it to trim 
leading/trailing whitespace, mirroring the new float behaviour.
     Spark Ref: 
https://github.com/apache/spark/blob/589141e3085a2cb875cfda8530b800fc53ac019f/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L1648
     - Refactor the cast expression tests to share setup helpers and add 
coverage for trimmed inputs, both columnar and scalar.
   
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
     - Yes. Casting Utf8 strings that contain surrounding whitespace into 
numeric types now succeeds instead of yielding nulls, matching Spark casting 
semantics.
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   
   # How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   -->
     - Extended unit coverage in 
`native-engine/datafusion-ext-commons/src/arrow/cast.rs` and 
`native-engine/datafusion-ext-exprs/src/cast.rs` to exercise the new trimming 
logic for float, int, and scalar casts.
    ```shell
   cargo test -p datafusion-ext-commons cast
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to