alamb commented on PR #5554:
URL:
https://github.com/apache/arrow-datafusion/pull/5554#issuecomment-1472087640
> ClickBench count distinct query when using dictionary columns is getting
killed (this is on main as well as the PR) 🤔
I wonder if we can try a smaller subset 🤔
```sql
❯ CREATE TABLE hits as select
arrow_cast("UserID", 'Dictionary(Int32, Utf8)') as "UserID"
FROM 'hits.parquet'
limit 10000000;
0 rows in set. Query took 0.776 seconds.
❯ select count(distinct "UserID") from hits;
+-----------------------------+
| COUNT(DISTINCT hits.UserID) |
+-----------------------------+
| 1530334 |
+-----------------------------+
1 row in set. Query took 71.388 seconds.
```
I will try this on my benchmark machine
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]