[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

GitBox Sat, 30 Jan 2021 06:35:53 -0800


Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567253020




##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       I just looked into this. The downside currently is w.r.t. performance 
and being able to utilize Arrow kernels. The `ScalarFunction` implementation 
repeats scalar values, the date part, e.g. 'HOUR' for `date_part('HOUR', dt)` 
will be for repeated for each row.
   In PostgreSQL, expressions are not allowed for `date_part` / `extract`, 
`date_trunc` etc. :
   
   
https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
   
   > Note that here the field parameter needs to be a string value, not a name. 
The valid field names for date_part are the same as for extract. 
   
   This is also happening with the existing `date_trunc` function where 
currently the date part (as string) is repeated / matched against for each row 
and also evaluated per row (see below). That won't work with the `hour` kernel 
for obvious reasons.
   
   ```rust
       let result = range
           .map(|i| {
               if array.is_null(i) {
                   Ok(0_i64)
               } else {
                   let date_time = match granularity_array.value(i) {
                       "second" => array
                           .value_as_datetime(i)
                           .and_then(|d| d.with_nanosecond(0)),
                       "minute" => array
                           .value_as_datetime(i)
                           .and_then(|d| d.with_nanosecond(0))
                           .and_then(|d| d.with_second(0)),
   [...]
   ```
   So I think here we have a few options:
   
   * Refactor/Optimize `ScalarFunction` to also allow for scalar values, be 
able to check on them + support literals (I guess it should use `ColumnarValue` 
instead of just `Array`s).
   * Have a similar (inefficient) implementation for extract from / date_part 
to compute as currently `date_trunc`. I think that 
   * Refactor/Optimize `ScalarFunction` later and keep the Extract as is for 
now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Reply via email to