Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

via GitHub Wed, 08 May 2024 10:28:57 -0700


parthchandra commented on code in PR #383:
URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594382358



##########
core/src/execution/datafusion/expressions/cast.rs:
##########
@@ -954,13 +993,63 @@ fn parse_str_to_time_only_timestamp(value: &str) -> 
CometResult<Option<i64>> {
     Ok(Some(timestamp))
 }
 
+fn date_parser(value: &str, eval_mode: EvalMode) -> CometResult<Option<i32>> {

Review Comment:
   I wasn't familiar with Spark's string to date conversion so I took a look. 
(https://github.com/apache/spark/blob/9d79ab42b127d1a12164cec260bfbd69f6da8b74/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala#L312)
   From the comment the allowed formats are - 
   ```
     * `[+-]yyyy*`
      * `[+-]yyyy*-[m]m`
      * `[+-]yyyy*-[m]m-[d]d`
      * `[+-]yyyy*-[m]m-[d]d `
      * `[+-]yyyy*-[m]m-[d]d *`
      * `[+-]yyyy*-[m]m-[d]dT*`
     ```  
     I honestly don't know what a string with a 'plus/minus' at the beginning 
of the date even means but you might want to handle that case.
     Also, the max number of digits allowed for the year is 7. 
     Finally, once you've got the 'day' segment of the date you may have a ' ' 
or 'T' (you're only handling the latter) and the characters after that are 
discarded.
      It looks to me like Spark's custom implementation might be slightly 
faster since it manages to achieve the split of the string into segments and 
the parsing of the digits in a single pass. (also does not need to prepare the 
parser with the format string). You might want to consider doing the same.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

Reply via email to