watford-ep opened a new issue, #18327:
URL: https://github.com/apache/datafusion/issues/18327
### Describe the bug
Sometimes the `SortPreservingMerge` sanity check rejects valid ORDER BY with
CASE expressions. This doesn't universally happen, but seems to happen only the
columns are from joins (debug output reformatted for readability):
```
Plan: [
SortPreservingMergeExec: [CASE WHEN typtype@3 = b OR typtype@3 = e OR
typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]
SortExec: expr=[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p
THEN 0 WHEN typtype@3 = r THEN 1 ELSE 999 END ASC NULLS LAST],
preserve_partitioning=[true]
ProjectionExec: expr=[nspname@3 as nspname, oid@0 as oid, typname@1 as
typname, typtype@2 as typtype]
CoalesceBatchesExec: target_batch_size=8192
HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(typnamespace@2, oid@0)], projection=[oid@0, typname@1, typtype@3,
nspname@5]
CoalesceBatchesExec: target_batch_size=8192
FilterExec: Use typtype@3 IN (SET) ([Literal { value: Utf8("b"),
field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("r"), field:
Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("m"), field:
Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("e"), field:
Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("d"), field:
Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} } }])
DataSourceExec: partitions=1, partition_sizes=[1]
RepartitionExec: partitioning=RoundRobinBatch(12),
input_partitions=1
StreamingTableExec: partition_sizes=1, projection=[oid, nspname]
] does not satisfy order requirements:
[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN
typtype@3 = r THEN 1 END ASC NULLS LAST].
Child-0 order:
[[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN
typtype@3 = r THEN 1 END ASC NULLS LAST]]
```
I think this shows that CASE expressions lose their ability to be compared
for equality. Projecting the CASE and ordering on the projection does not fail.
I found this using a tool which wraps [datafusion around the postgres wire
protocol](https://github.com/datafusion-contrib/datafusion-postgres) so that I
could use our Iceberg data lake from a BI tool which supports Postgres.
### To Reproduce
I did some bisect'ing on [a query issued by npgsql at
startup](https://github.com/npgsql/npgsql/issues/4545) and found this minimal
reproduction:
```sql
SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN typ.typtype IN ('b','e','p') THEN 0 WHEN typ.typtype =
'r' THEN 1 END
```
I initially thought a missing `ELSE` clause was the culprit, but it does not
change anything:
```sql
SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN typ.typtype IN ('b','e','p') THEN 0 WHEN typ.typtype =
'r' THEN 1 ELSE 999 END
```
Using:
https://github.com/watford-ep/datafusion-postgres/tree/fix-revealbi-startup-sql
```sh
$ cargo test --test revealbi_minimal_repro
```
### Expected behavior
This query executes because it contains a valid ORDER BY which is stable
across partitions, with or without the ELSE clause.
### Additional context
This succeeds:
```sql
SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid =
typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN COALESCE(typ.typtype, '') IN ('b','e','p') THEN 0
WHEN COALESCE(typ.typtype, '') = 'r' THEN 1 END
```
But applying the same fix to the original query does not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]