Omega359 commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020191523
########## datafusion/functions/src/datetime/to_char.rs: ########## @@ -220,6 +221,13 @@ fn _to_char_scalar( } } + // eagerly cast Date32 values to Date64 to support date formatting with time-related specifiers + // without error. + if data_type == &Date32 { Review Comment: Some data from a quick little benchmark I wrote ``` test_date32_to_date64_cast_array_1000 time: [311.06 ns 314.48 ns 318.16 ns] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe test_parse_performance_with_time_specifiers_array_1000 time: [343.79 ns 351.64 ns 359.98 ns] Found 14 outliers among 100 measurements (14.00%) 10 (10.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe test_parse_performance_without_time_specifiers_array_1000 time: [196.59 µs 201.06 µs 206.45 µs] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) high mild 8 (8.00%) high severe ``` test_date32_to_date64_cast_array_1000: just casts from date32 to date64 test_parse_performance_with_time_specifiers_array_1000: parses the formats looking for time specifiers and when found does the cast from date32 to date64 test_parse_performance_without_time_specifiers_array_1000: parses the formats looking for time specifiers (doesn't find anything), no cast Note that results on my machine will vary +-10% between runs. The check for time specifiers in the format is simple and could be improved but I think is enough to show general performance. Note that the list of time specifiers is not complete ``` const TIME_SPECIFIERS: &[&str] = &[ "%H", "%M", "%S", "%c", "%f", "%k", "%I", "%l", "%p", "%R", "%T", "%x", "%r", "%Z", ]; fn has_time_specifiers(str: &str) -> bool { for specifier in TIME_SPECIFIERS { if str.contains(specifier) { return true; } } false } ``` A couple of takeaways from this. casting is not as cheap as I thought however parsing is seems to be more expensive than that but perhaps with some really good optimizations it could be reduced. I don't see a good way to make this feature have no cost with Date32 && no time specifiers in the format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org