parthchandra commented on PR #4256:
URL:
https://github.com/apache/datafusion-comet/pull/4256#issuecomment-4410306347
> While going through this with `/review-comet-pr` I dropped a small set of
edge-case SQL test files under
`spark/src/test/resources/sql-tests/expressions/datetime/edge_cases/` and ran
them against the PR locally with `-Pspark-4.1`. Posting the results in case
they're useful.
>
> **8 of 10 edge-case queries diverge from Spark.** The fractional-digit
truncation cases work fine. All eight failures are leading-whitespace variants:
>
> Query Spark This PR
> `to_time(' 12:30:45')` `12:30:45` `CometNativeException: cannot
be parsed to a TIME value`
> `to_time(' 12:30:45')` `12:30:45` same exception
> `to_time(' 12:30:45 ')` `12:30:45` same exception
> `to_time(' 1:00:00 PM')` `13:00` same exception
> `to_time('\t12:30:45')` `12:30:45` same exception
> `to_time('\n12:30:45')` `12:30:45` same exception
> `try_to_time(' 12:30:45')` `12:30:45` `null`
> `try_to_time(' 1:00:00 PM')` `13:00` `null`
> Passing edge cases (both engines agree):
>
> * `to_time('00:00:00.1234567')` and `to_time('12:30:45.999999999')`.
Fractional truncation past microseconds works correctly.
>
> **Root cause.** `string_to_time` in
`native/spark-expr/src/datetime_funcs/to_time.rs` only does `s.trim_end()`.
Spark's path is `stringToTime` then `parseTimestampString`
(`SparkDateTimeUtils.scala:493`), and `parseTimestampString` trims both ends
with `getTrimmedStart` / `getTrimmedEnd`, which use
`UTF8String.isWhitespaceOrISOControl` so tabs and newlines are stripped too. A
`trim` at the top of `string_to_time` (covering ASCII whitespace and ISO
control bytes) should close the gap.
>
> The unit test `test_leading_space_with_t_prefix` in `to_time.rs` is
correct in result for `" T12:30"` but its comment `Spark only right-trims` is
misleading: Spark does an additional left-trim inside `parseTimestampString`,
and that input is rejected because the `T`-prefix branch is gated on `j == 0`
rather than because of trimming. Worth rewording so future readers don't follow
the trail.
Thanks for catching this. Updated and added all the above as test cases
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]