nuno-faria opened a new issue, #18982:
URL: https://github.com/apache/datafusion/issues/18982
### Describe the bug
I think some ClickBench queries are not being correctly executed, as
`EventDate` is being treated as a string.
For example, in query 36:
```sql
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate" >= '2013-07-01'
AND "EventDate" <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;
```
When executing it, `EventDate` (originally a `UInt16`) will be casted to
`Utf8`:
```
DataSourceExec: ...
predicate=CounterID@1 = 62
AND CAST(EventDate@0 AS Utf8) >= 2013-07-01
AND CAST(EventDate@0 AS Utf8) <= 2013-07-31
AND DontCountHits@4 = 0
AND IsRefresh@3 = 0
AND URL@2 != , ...
```
This in turn makes the query return nothing. If we instead cast the
`EventDate` first to date then it will work as expected:
```sql
-- current
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate" >= '2013-07-01'
AND "EventDate" <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;
+-----+-----------+
| URL | pageviews |
+-----+-----------+
+-----+-----------+
0 row(s) fetched.
Elapsed 0.120 seconds.
-- with cast
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate"::int::date >= '2013-07-01'
AND "EventDate"::int::date <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;
+--------------------------------------------------------------------------+-----------+
| URL |
pageviews |
+--------------------------------------------------------------------------+-----------+
| http://irr.ru/index.php?showalbum/login-leniya7777294,938303130 |
102341 |
| http://komme%2F27.0.1453.116 |
51218 |
| http://irr.ru/index.php?showalbum/login-kapusta-advert2668]=0&order_by=0 |
18315 |
| http://irr.ru/index.php?showalbum/login-kapustic/product_name |
16461 |
| http://irr.ru/index.php |
12577 |
| http://irr.ru/index.php?showalbum/login |
10880 |
| http://komme%2F27.0.1453.116 Safari%2F5.0 (compatible; MSIE 9.0; |
7627 |
| http://irr.ru/index.php?showalbum/login-kupalnik |
4369 |
| http://irr.ru/index.php?showalbum/login-kapusta-advert27256.html_params |
4058 |
| http://komme%2F27.0.1453.116 Safari |
3021 |
+--------------------------------------------------------------------------+-----------+
10 row(s) fetched.
Elapsed 0.189 seconds.
```
I tracked it down to this issue
https://github.com/apache/datafusion/issues/15509 and this PR
https://github.com/apache/datafusion/pull/15574, where it has been removed by
mistake and the existing tests were not able to caught it. It includes queries
36 to 42. I think the confusion came from the fact that DuckDB also does not do
the cast when executing the query, [but it casts when creating the
view](https://github.com/ClickHouse/ClickBench/blob/d876e9bcf30f09cd5df0e025722db1e95374aa1c/duckdb-parquet/create.sql#L1-L4):
```sql
CREATE VIEW hits AS
SELECT *
REPLACE (make_date(EventDate) AS EventDate)
FROM read_parquet('hits.parquet', binary_as_string=True);
```
### To Reproduce
ClickBench.
### Expected behavior
Run data in queries 36-42.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]