parthchandra commented on code in PR #3730:
URL: https://github.com/apache/datafusion-comet/pull/3730#discussion_r2976308663
##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -959,8 +974,20 @@ fn get_timestamp_values<T: TimeZone>(
timestamp_type: &str,
tz: &T,
) -> SparkResult<Option<i64>> {
- let values: Vec<_> = value.split(['T', '-', ':', '.']).collect();
- let year = values[0].parse::<i32>().unwrap_or_default();
+ // Handle negative year: strip leading '-' and remember the sign.
+ let (sign, date_part) = if let Some(stripped) = value.strip_prefix('-') {
+ (-1i32, stripped)
+ } else {
+ (1i32, value)
+ };
+ let values: Vec<_> = date_part.split(['T', ' ', '-', ':', '.']).collect();
Review Comment:
Will try to improve this in a followup
##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -1092,40 +1109,119 @@ fn timestamp_parser<T: TimeZone>(
if value.is_empty() {
return Ok(None);
}
- // Define regex patterns and corresponding parsing functions
- let patterns = &[
+
+ // Handle Z or ±HH:MM offset suffix: strip it and parse with the explicit
fixed offset.
+ if let Some((stripped, offset_secs)) = extract_offset_suffix(value) {
+ let fixed_tz = chrono::FixedOffset::east_opt(offset_secs)
+ .ok_or_else(|| SparkError::Internal("Invalid timezone
offset".to_string()))?;
+ return timestamp_parser_with_tz(stripped, eval_mode, &fixed_tz);
+ }
+
+ timestamp_parser_with_tz(value, eval_mode, tz)
+}
+
+/// If `value` ends with a UTC offset suffix (`Z`, `+HH:MM`, or `-HH:MM`),
returns the
+/// stripped string and the offset in seconds. Returns `None` if no offset
suffix is present.
Review Comment:
We'll return None and eventually fail. The corresponding code in Spark
`SparkDateTimeUtils.getZoneId` indicates that these formats (H:mm and HH:m) are
retained for backward compatibility with Spark before 3.0. Let me create an
issue to handle those as well as named timezones (e.g. 'America/New_York')
which are less commonly used.
There is a Spark test ` test("SPARK-35780: support full range of timestamp
string")` which currently passes. I'll double check in the next PR to make sure
it is being exercised. The test does not include named timezones so perhaps
that is not needed.
##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -1092,40 +1109,119 @@ fn timestamp_parser<T: TimeZone>(
if value.is_empty() {
return Ok(None);
}
- // Define regex patterns and corresponding parsing functions
- let patterns = &[
+
+ // Handle Z or ±HH:MM offset suffix: strip it and parse with the explicit
fixed offset.
+ if let Some((stripped, offset_secs)) = extract_offset_suffix(value) {
+ let fixed_tz = chrono::FixedOffset::east_opt(offset_secs)
+ .ok_or_else(|| SparkError::Internal("Invalid timezone
offset".to_string()))?;
+ return timestamp_parser_with_tz(stripped, eval_mode, &fixed_tz);
+ }
+
+ timestamp_parser_with_tz(value, eval_mode, tz)
+}
+
+/// If `value` ends with a UTC offset suffix (`Z`, `+HH:MM`, or `-HH:MM`),
returns the
+/// stripped string and the offset in seconds. Returns `None` if no offset
suffix is present.
Review Comment:
Issue to support more formats -
https://github.com/apache/datafusion-comet/issues/3775
##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -1027,28 +1054,19 @@ fn parse_timestamp_to_micros<T: TimeZone>(
timestamp_info.second,
);
- // Check if datetime is not None
- let tz_datetime = match datetime.single() {
+ // Spark uses the offset before daylight savings change so we need to use
earliest()
+ // Return None for LocalResult::None which is the invalid time in a DST
spring forward gap).
Review Comment:
There are tests in Sparks DateTimeUtilsSuite which do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]