alamb opened a new issue, #7039:
URL: https://github.com/apache/arrow-datafusion/issues/7039

   ### Describe the bug
   
   When running the following query (from ClickBench) on the partitioned 
dataset  (100 parquet files)
   
   ```sql
   SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM 
hits_partitioned WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" 
ORDER BY u DESC LIMIT 10;
   ```
   
   I get the following error:
   ```
   Error during planning: Cannot infer common argument type for comparison 
operation Binary != Utf8
   ```
   
   ### To Reproduce
   
   Get the data using `bench.sh`  (after 
https://github.com/apache/arrow-datafusion/pull/7005 is merged)
   ```shell
   bench.sh data clickbench_1
   bench.sh data clickbench_multi
   ```
   
   ```sql
   
   CREATE EXTERNAL TABLE hits_partitioned
   STORED AS PARQUET
   LOCATION 'hits_partitioned';
   
   SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM 
hits_partitioned WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" 
ORDER BY u DESC LIMIT 10;
   ```
   
   ### Expected behavior
   
   The query works fine with the single file dataset. I expect the same error
   
   ```
   -- Single file parquet
   CREATE EXTERNAL TABLE hits_single
   STORED AS PARQUET
   LOCATION 'hits.parquet';
   
   -- Single file works great
   SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits_single 
WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC 
LIMIT 10;
   ```
   +------------------------------+---------+
   | hits_single.MobilePhoneModel | u       |
   +------------------------------+---------+
   | iPad                         | 1090347 |
   | iPhone                       | 45758   |
   | A500                         | 16046   |
   | N8-00                        | 5565    |
   | iPho                         | 3300    |
   | ONE TOUCH 6030A              | 2759    |
   | GT-P7300B                    | 1907    |
   | 3110000                      | 1871    |
   | GT-I9500                     | 1598    |
   | eagle75                      | 1492    |
   +------------------------------+---------+
   ```
   
   ```
   
   ### Additional context
   
   I found this while working on some benchmark results for #6988 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to