parthchandra commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594382358
########## core/src/execution/datafusion/expressions/cast.rs: ########## @@ -954,13 +993,63 @@ fn parse_str_to_time_only_timestamp(value: &str) -> CometResult<Option<i64>> { Ok(Some(timestamp)) } +fn date_parser(value: &str, eval_mode: EvalMode) -> CometResult<Option<i32>> { Review Comment: I wasn't familiar with Spark's string to date conversion so I took a look. (https://github.com/apache/spark/blob/9d79ab42b127d1a12164cec260bfbd69f6da8b74/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala#L312) From the comment the allowed formats are - ``` * `[+-]yyyy*` * `[+-]yyyy*-[m]m` * `[+-]yyyy*-[m]m-[d]d` * `[+-]yyyy*-[m]m-[d]d ` * `[+-]yyyy*-[m]m-[d]d *` * `[+-]yyyy*-[m]m-[d]dT*` ``` I honestly don't know what a string with a 'plus/minus' at the beginning of the date even means but you might want to handle that case. Also, the max number of digits allowed for the year is 7. Finally, once you've got the 'day' segment of the date you may have a ' ' or 'T' (you're only handling the latter) and the characters after that are discarded. It looks to me like Spark's custom implementation might be slightly faster since it manages to achieve the split of the string into segments and the parsing of the digits in a single pass. (also does not need to prepare the parser with the format string). You might want to consider doing the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org