alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567411606



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1964,6 +1964,20 @@ async fn crypto_expressions() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn extract_date_part() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    let sql = "SELECT
+        EXTRACT(HOUR FROM CAST('2020-01-01' AS DATE)),
+        EXTRACT(HOUR FROM to_timestamp('2020-09-08T12:00:00+00:00'))

Review comment:
       I think adding coverage for the other date parts would be valuable here 
(even if they error)

##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       I also agree (belatedly) that having a more efficient implementation of 
constant function arguments (e.g. #9376) is the way to go!
   
   In terms of adding variants to `Expr` I think it will need be done when the 
semantics of whatever expression is being added can't realistically be 
expressed as a function (e.g. CASE). 
   
   So in this case, given @jorgecarleitao is cranking along with #9376 it seems 
like perhaps this PR should perhaps try and translate `EXTRACT` into a function.

##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+    array::{
+        Date32Array, Date64Array, TimestampMicrosecondArray, 
TimestampMillisecondArray,
+        TimestampNanosecondArray, TimestampSecondArray,
+    },
+    compute::hour,
+    datatypes::{DataType, Schema, TimeUnit},
+    record_batch::RecordBatch,
+};
+
+use crate::{
+    error::DataFusionError,
+    logical_plan::DatePart,
+    physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+    }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+    date_part: DatePart,
+    expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+    /// Create new Extract expression
+    pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self { date_part, expr }
+    }
+}
+
+impl PhysicalExpr for Extract {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+
+    fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+        Ok(DataType::Int32)
+    }
+
+    fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+        self.expr.nullable(input_schema)
+    }
+
+    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+        let value = self.expr.evaluate(batch)?;
+        let data_type = value.data_type();
+        let array = match value {
+            ColumnarValue::Array(array) => array,
+            ColumnarValue::Scalar(scalar) => scalar.to_array(),
+        };
+
+        match data_type {
+            DataType::Date32 => {
+                let array = 
array.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok(ColumnarValue::Array(Arc::new(hour(array)?)))

Review comment:
       I find it confusing that `date_part` is passed all the way down in the 
Exprs / trees only to be ignored in the actual implementation which directly 
calls `hour. I can see that the expr seems to always be made with 
`DatePart::Hour` but I am not 100% sure. 
   
   I am fine with not supporting all the various date parts initially, but I 
would recommend the following as a way of documenting through code / safe guard 
against future bugs:
   
   1. Add a test for `EXTRACT DAY from timestamp` and show that it generates a 
useful error
   2. Add a check in this function for `date_part != DataPart::Hour` and throw 
an error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to