alamb commented on code in PR #6832:
URL: https://github.com/apache/arrow-datafusion/pull/6832#discussion_r1267268245
##########
datafusion/core/tests/sqllogictests/test_files/decimal.slt:
##########
@@ -365,7 +365,7 @@ select c1*c5 from decimal_simple;
query T
select arrow_typeof(c1/cast(0.00001 as decimal(5,5))) from decimal_simple
limit 1;
----
-Decimal128(21, 12)
+Decimal128(19, 10)
Review Comment:
cc @liukun4515 and @viirya -- I assume you have reviewed the decimal changes
upstream
##########
datafusion/core/tests/sqllogictests/test_files/arrow_typeof.slt:
##########
@@ -303,9 +303,11 @@ select arrow_cast('30 minutes', 'Interval(MonthDayNano)');
## Duration
-query error DataFusion error: Optimizer rule 'simplify_expressions'
failed\ncaused by\nThis feature is not implemented: Can't create a scalar from
array of type "Duration\(Second\)"
+query ?
Review Comment:
🎉
##########
datafusion/core/tests/sql/timestamp.rs:
##########
@@ -1035,13 +1035,13 @@ async fn timestamp_sub_with_tz() -> Result<()> {
let sql = "SELECT val, ts1 - ts2 AS ts_diff FROM table_a ORDER BY ts2 -
ts1";
let actual = execute_to_batches(&ctx, sql).await;
let expected = vec![
- "+-----+---------------------------------------------------+",
- "| val | ts_diff |",
- "+-----+---------------------------------------------------+",
- "| 3 | 0 years 0 mons 0 days 10 hours 0 mins 30.000 secs |",
- "| 1 | 0 years 0 mons 0 days 10 hours 0 mins 20.000 secs |",
- "| 2 | 0 years 0 mons 0 days 10 hours 0 mins 10.000 secs |",
- "+-----+---------------------------------------------------+",
+ "+-----+---------+",
+ "| val | ts_diff |",
+ "+-----+---------+",
+ "| 3 | PT30S |",
Review Comment:
I remember the `PT30S` format is some sort of standard but I think it is a
pretty terrible UX and very uncommon.
I know there is a technical difference between duration and interval in the
arrow spec but as a user, I find this very confusing:
```
❯ select (now() - '2023-01-01'::timestamp);
+----------------------------+
| now() - Utf8("2023-01-01") |
+----------------------------+
| P198DT72932.972880S |
+----------------------------+
```
On the other hand the original formatting is not consistent with Postgres
either.
```
postgres=# select interval '5 minutes 2 days';
interval
-----------------
2 days 00:05:00
(1 row)
postgres=# select interval '5 minutes';
interval
----------
00:05:00
(1 row)
```
I played around with this locally and we can still get an `interval` with
an explicit cast
```sql
❯ select (now() - '2023-01-01'::timestamp);
+----------------------------+
| now() - Utf8("2023-01-01") |
+----------------------------+
| P198DT72932.972880S |
+----------------------------+
1 row in set. Query took 0.002 seconds.
❯ select (now() - '2023-01-01'::timestamp)::interval;
+------------------------------------------------------------+
| now() - Utf8("2023-01-01") |
+------------------------------------------------------------+
| 0 years 0 mons 0 days 4772 hours 15 mins 35.571791000 secs |
+------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.
```
So we can perhaps add the appropriate coercion rules if necessary
Or perhaps we can change the formatting of Durations in arrow-rs to be more
interval like
@waitingkuo do you have any thoughts
##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -2609,14 +2441,14 @@ mod tests {
Operator::Divide,
create_decimal_array(
&[
- Some(99193548387), // 0.99193548387
Review Comment:
why these changes?
##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
Review Comment:
👏
##########
datafusion/core/tests/sql/timestamp.rs:
##########
@@ -1035,13 +1035,13 @@ async fn timestamp_sub_with_tz() -> Result<()> {
let sql = "SELECT val, ts1 - ts2 AS ts_diff FROM table_a ORDER BY ts2 -
ts1";
let actual = execute_to_batches(&ctx, sql).await;
let expected = vec![
- "+-----+---------------------------------------------------+",
- "| val | ts_diff |",
- "+-----+---------------------------------------------------+",
- "| 3 | 0 years 0 mons 0 days 10 hours 0 mins 30.000 secs |",
- "| 1 | 0 years 0 mons 0 days 10 hours 0 mins 20.000 secs |",
- "| 2 | 0 years 0 mons 0 days 10 hours 0 mins 10.000 secs |",
- "+-----+---------------------------------------------------+",
+ "+-----+---------+",
+ "| val | ts_diff |",
+ "+-----+---------+",
+ "| 3 | PT30S |",
Review Comment:
I remember the `PT30S` format is some sort of standard but I think it is a
pretty terrible UX and very uncommon.
I know there is a technical difference between duration and interval in the
arrow spec but as a user, I find this very confusing:
```
❯ select (now() - '2023-01-01'::timestamp);
+----------------------------+
| now() - Utf8("2023-01-01") |
+----------------------------+
| P198DT72932.972880S |
+----------------------------+
```
On the other hand the original formatting is not consistent with Postgres
either.
```
postgres=# select interval '5 minutes 2 days';
interval
-----------------
2 days 00:05:00
(1 row)
postgres=# select interval '5 minutes';
interval
----------
00:05:00
(1 row)
```
I played around with this locally and we can still get an `interval` with
an explicit cast
```sql
❯ select (now() - '2023-01-01'::timestamp);
+----------------------------+
| now() - Utf8("2023-01-01") |
+----------------------------+
| P198DT72932.972880S |
+----------------------------+
1 row in set. Query took 0.002 seconds.
❯ select (now() - '2023-01-01'::timestamp)::interval;
+------------------------------------------------------------+
| now() - Utf8("2023-01-01") |
+------------------------------------------------------------+
| 0 years 0 mons 0 days 4772 hours 15 mins 35.571791000 secs |
+------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.
```
So we can perhaps add the appropriate coercion rules if necessary
Or perhaps we can change the formatting of Durations in arrow-rs to be more
interval like
@waitingkuo do you have any thoughts
##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
Review Comment:
👏
##########
datafusion/core/tests/sqllogictests/test_files/interval.slt:
##########
@@ -430,15 +436,11 @@ select '1 month'::interval +
'1980-01-01T12:00:00'::timestamp;
----
1980-02-01T12:00:00
-query D
+query error DataFusion error: Error during planning: Cannot coerce arithmetic
expression Interval\(MonthDayNano\) \- Date32 to valid types
select '1 month'::interval - '1980-01-01'::date;
-----
-1979-12-01
-query P
+query error DataFusion error: Error during planning: Cannot coerce arithmetic
expression Interval\(MonthDayNano\) \- Timestamp\(Nanosecond, None\) to valid
types
select '1 month'::interval - '1980-01-01T12:00:00'::timestamp;
-----
Review Comment:
I agree an interval minus a timestamp doesn't make a lot of sense
##########
datafusion/core/tests/sqllogictests/test_files/binary.slt:
##########
@@ -110,14 +110,19 @@ ff0102 FixedSizeBinary(3)
000102 FixedSizeBinary(3)
# Comparison
-# https://github.com/apache/arrow-rs/pull/4492
-query error DataFusion error: Arrow error: Compute error: eq_dyn_binary_scalar
only supports Binary or LargeBinary arrays
Review Comment:
FYI @maxburke
##########
datafusion/core/tests/sqllogictests/test_files/arrow_typeof.slt:
##########
@@ -303,9 +303,11 @@ select arrow_cast('30 minutes', 'Interval(MonthDayNano)');
## Duration
-query error DataFusion error: Optimizer rule 'simplify_expressions'
failed\ncaused by\nThis feature is not implemented: Can't create a scalar from
array of type "Duration\(Second\)"
+query ?
Review Comment:
🎉
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]