Re: [PR] improve null handling for to_char [arrow-datafusion]

via GitHub Thu, 21 Mar 2024 09:45:05 -0700


alamb commented on code in PR #9689:
URL: https://github.com/apache/arrow-datafusion/pull/9689#discussion_r1534272307



##########
datafusion/functions/src/datetime/to_char.rs:
##########
@@ -172,6 +172,19 @@ fn _to_char_scalar(
     let data_type = &expression.data_type();
     let is_scalar_expression = matches!(&expression, ColumnarValue::Scalar(_));
     let array = expression.into_array(1)?;
+
+    if format.is_none() {
+        if is_scalar_expression {
+            return Ok(ColumnarValue::Scalar(ScalarValue::Utf8(
+                Some(String::new()),

Review Comment:
   I thought `None` is the correct value (as it will semantically be a`NULL`, 
which is the correct results)
   
   The sqllogictests have special formatting for `NULL` values (to distinguish 
them from empty strings)
   
   
https://github.com/apache/arrow-datafusion/blob/1d8a41bc8e08b56e90d6f8e6ef20e39a126987e4/datafusion/sqllogictest/src/engines/datafusion_engine/normalize.rs#L198-L200
   
   However, when I double checked,  the behavior of `to_date` in spark seems to 
be different yet (passing in a null format seems to simply ignore the format 
string and instead parses with the default -- it doesn't return `null`)
   
   ```python
   >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
   >>> df.select(functions.to_date(df.t, 'yyyy-MM-dd 
HH:mm:ss').alias('date')).collect()
   [Row(date=datetime.date(1997, 2, 28))]
   >>> df.select(functions.to_date(df.t, None).alias('date')).collect()
   [Row(date=datetime.date(1997, 2, 28))]
   ```
   
   
   ```python
   >>> df = spark.createDataFrame([('1997-02-2ddddd',)], ['t'])
   >>> df.select(functions.to_date(df.t, None).alias('date')).collect()
   [Row(date=None)]
   >>> df.select(functions.to_date(df.t, 'yyyy-MM-dd 
HH:mm:ss').alias('date')).collect()
   [Row(date=None)]
   ```
   
   Any hints @Omega359 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] improve null handling for to_char [arrow-datafusion]

Reply via email to