alamb commented on code in PR #12514:
URL: https://github.com/apache/datafusion/pull/12514#discussion_r1767487219
##########
datafusion/sqllogictest/test_files/expr.slt:
##########
@@ -1472,6 +1472,135 @@ SELECT extract(epoch from arrow_cast('1969-12-31',
'Date64'))
----
-86400
+# test_extract_interval
Review Comment:
Can you please add some tests for:
1. Tables with intervals (including some null values) rather than just
scalars
2. Nulls
?
##########
datafusion/functions/src/datetime/date_part.rs:
##########
@@ -223,9 +239,17 @@ fn seconds(array: &dyn Array, unit: TimeUnit) ->
Result<ArrayRef> {
let secs = as_int32_array(secs.as_ref())?;
let subsecs = date_part(array, DatePart::Nanosecond)?;
let subsecs = as_int32_array(subsecs.as_ref())?;
+ // REVIEW: is there a better way to do this? (Is it even a reasonable
thing to do?) I want to treat
+ // null as 0, I was expecting some kind of map function (or even better
map_if_null) on PrimitiveArray.
+ // Instead I have just discarded the null mask, which feels bad because if
I was meant to be able
+ // to do this, I'd expect there to be a function to do it. I think it is
also possible that a
+ // null-masked value in the array will not be 0 (I don't think that
happens right now, but I think
+ // we're one optimisation away from disaster).
+ let subsecs: PrimitiveArray<Int32Type> =
+ PrimitiveArray::new(subsecs.values().clone(), None);
- let r: Float64Array = binary(secs, subsecs, |secs, subsecs| {
- (secs as f64 + (subsecs as f64 / 1_000_000_000_f64)) * sf
+ let r: Float64Array = binary(secs, &subsecs, |secs, subsecs| {
+ (secs as f64 + ((subsecs % 1_000_000_000) as f64 / 1_000_000_000_f64))
* sf
Review Comment:
I think the more correct way is to use a kernel other than
[binary](https://docs.rs/arrow/latest/arrow/compute/fn.binary.html#details)
which s designed to be faster by not having to check each element for null
For example, something like this (untested):
```rust
let r: Float64Array = subsecs.null_count() == 0 {
// special case binary when there are no nulls for speed
binary(secs, subsecs, |secs, subsecs| {
(secs as f64 + (subsecs as f64 / 1_000_000_000_f64)) * sf
})?
} else {
// handle nulls
secs.iter().zip(subsecs.iter()).map(|(secs, subsecs) {
secs.map(|secs| {
let subsecs = subsecs.unwrap_or(0); // default null to 0
(secs as f64 + ((subsecs % 1_000_000_000) as f64 /
1_000_000_000_f64)) * sf
})
}).collect()
};
```
##########
datafusion/sqllogictest/test_files/expr.slt:
##########
@@ -1472,6 +1472,135 @@ SELECT extract(epoch from arrow_cast('1969-12-31',
'Date64'))
----
-86400
+# test_extract_interval
+
+query R
+SELECT extract(year from arrow_cast('10 years', 'Interval(YearMonth)'))
+----
+10
+
+query R
+SELECT extract(month from arrow_cast('10 years', 'Interval(YearMonth)'))
+----
+0
+
+query R
+SELECT extract(year from arrow_cast('10 months', 'Interval(YearMonth)'))
+----
+0
+
+query R
+SELECT extract(month from arrow_cast('10 months', 'Interval(YearMonth)'))
+----
+10
+
+query R
+SELECT extract(year from arrow_cast('20 months', 'Interval(YearMonth)'))
+----
+1
+
+query R
+SELECT extract(month from arrow_cast('20 months', 'Interval(YearMonth)'))
+----
+8
+
+query error DataFusion error: Arrow error: Compute error: Year does not
support: Interval\(DayTime\)
+SELECT extract(year from arrow_cast('10 days', 'Interval(DayTime)'))
+
+query error DataFusion error: Arrow error: Compute error: Month does not
support: Interval\(DayTime\)
+SELECT extract(month from arrow_cast('10 days', 'Interval(DayTime)'))
+
+query R
+SELECT extract(day from arrow_cast('10 days', 'Interval(DayTime)'))
+----
+10
+
+query R
+SELECT extract(day from arrow_cast('14400 minutes', 'Interval(DayTime)'))
+----
+0
+
+query R
+SELECT extract(minute from arrow_cast('14400 minutes', 'Interval(DayTime)'))
+----
+14400
+
+query R
+SELECT extract(second from arrow_cast('5.1 seconds', 'Interval(DayTime)'))
+----
+5
+
+query R
+SELECT extract(second from arrow_cast('14400 minutes', 'Interval(DayTime)'))
+----
+864000
+
+query R
+SELECT extract(second from arrow_cast('2 months', 'Interval(MonthDayNano)'))
+----
+0
+
+query R
+SELECT extract(second from arrow_cast('2 days', 'Interval(MonthDayNano)'))
+----
+0
+
+query R
+SELECT extract(second from arrow_cast('2 seconds', 'Interval(MonthDayNano)'))
+----
+2
+
+query R
+SELECT extract(milliseconds from arrow_cast('2 seconds',
'Interval(MonthDayNano)'))
+----
+2000
+
+query R
+SELECT extract(second from arrow_cast('2030 milliseconds',
'Interval(MonthDayNano)'))
+----
+2.03
+
+# test_extract_duration
+
+query R
+SELECT extract(second from arrow_cast(2, 'Duration(Second)'))
Review Comment:
I suggest adding a test for `extract(seconds ..)` to this PR -- even if it
is just to document it doesn't work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]