Omega359 commented on code in PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020191523


##########
datafusion/functions/src/datetime/to_char.rs:
##########
@@ -220,6 +221,13 @@ fn _to_char_scalar(
         }
     }
 
+    // eagerly cast Date32 values to Date64 to support date formatting with 
time-related specifiers
+    // without error.
+    if data_type == &Date32 {

Review Comment:
   Some data from a quick little benchmark I wrote 
   ```
   test_date32_to_date64_cast_array_1000
                           time:   [311.06 ns 314.48 ns 318.16 ns]
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   test_parse_performance_with_time_specifiers_array_1000
                           time:   [343.79 ns 351.64 ns 359.98 ns]
   Found 14 outliers among 100 measurements (14.00%)
     10 (10.00%) low mild
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   test_parse_performance_without_time_specifiers_array_1000
                           time:   [196.59 µs 201.06 µs 206.45 µs]
   Found 10 outliers among 100 measurements (10.00%)
     2 (2.00%) high mild
     8 (8.00%) high severe
   ```
   test_date32_to_date64_cast_array_1000: just casts from date32 to date64
   
   test_parse_performance_with_time_specifiers_array_1000: parses the formats 
looking for time specifiers and when found does the cast from date32 to date64
   
   test_parse_performance_without_time_specifiers_array_1000: parses the 
formats looking for time specifiers (doesn't find anything), no cast
   
   Note that results on my machine will vary +-10% between runs. The check for 
time specifiers in the format is simple and could be improved but I think is 
enough to show general performance. Note that the list of time specifiers is 
not complete
   ```
   const TIME_SPECIFIERS: &[&str] = &[
       "%H", "%M", "%S", "%c", "%f", "%k", "%I", "%l", "%p", "%R", "%T", "%x", 
"%r", "%Z",
   ];
   
   fn has_time_specifiers(str: &str) -> bool {
       for specifier in TIME_SPECIFIERS {
           if str.contains(specifier) {
               return true;
           }
       }
   
       false
   }
   ```
   
   A couple of takeaways from this. casting is not as cheap as I thought 
however parsing is seems to be more expensive than that but perhaps with some 
really good optimizations it could be reduced.
   
   I don't see a good way to make this feature have no cost with Date32 && no 
time specifiers in the format.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to