alamb commented on issue #18070:
URL: https://github.com/apache/datafusion/issues/18070#issuecomment-3411006833
I was able to reproduce this . Thank you for the report @ianthetechie
<details><summary>repo.sql</summary>
<p>
```sql
CREATE EXTERNAL TABLE categories_raw STORED AS PARQUET LOCATION
's3://fsq-os-places-us-east-1/release/dt=2025-09-09/categories/parquet/';
CREATE EXTERNAL TABLE places STORED AS PARQUET LOCATION
's3://fsq-os-places-us-east-1/release/dt=2025-09-09/places/parquet/';
WITH categories_arr AS (
SELECT array_agg(category_id) AS category_ids FROM categories_raw LIMIT
500
)
SELECT COUNT(*)
FROM places p
WHERE date_refreshed >= CURRENT_DATE - INTERVAL '365 days' AND
array_has_any(p.fsq_category_ids, (SELECT category_ids FROM categories_arr));
```
</p>
</details>
With 49.0.2:
```shell
andrewlamb@Andrews-MacBook-Pro-3:~/Downloads$
~/Software/datafusion-cli/datafusion-cli-49.0.2 -f repro.sql
...
Elapsed 45.434 seconds.
```
With 50.2.0
```shell
andrewlamb@Andrews-MacBook-Pro-3:~/Downloads$
~/Software/datafusion-cli/datafusion-cli-50.2.0 -f repro.sql
DataFusion CLI v50.2.0
...
```
I killed it after 3 minutes (didn't let it finish)
Some initial profiling suggests 50.0.2 is calling `slice` a lot somehow on
an array:
<img width="1484" height="1139" alt="Image"
src="https://github.com/user-attachments/assets/c878c714-932b-4e0f-8a90-2395e94efb0a"
/>
I am looking
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]