waitingkuo opened a new issue, #3050:
URL: https://github.com/apache/arrow-datafusion/issues/3050

   **Describe the bug**
   A clear and concise description of what the bug is.
   
   This is part of #3048 
   
   I was doing the benchmark for 
[clickbench](https://benchmark.clickhouse.com/). One of it's column is binary, 
and the test query set contains `group by` that binary column. I got this error:
   ```
   Internal error: Unsupported data type in hasher: Binary. This was likely 
caused by a bug in DataFusion's code and we would welcome that you file an bug 
report in our issue tracker"
   ```
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   ``` bash
   
   # Download data
   wget 
https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_0.parquet
   
   # Use Datafusion-CLI
   ➜  datafusion git:(datafusion) ✗ datafusion-cli
   DataFusion CLI v10.0.0
   
   # Create External Table 
   ❯ CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'hits_0.parquet';
   0 rows in set. Query took 0.002 seconds.
   
   # This query work
   ❯ SELECT "URL" FROM hits LIMIT 10;
   
+--------------------------------------------------------------------------------------------------------------------------------------------------+
   | URL                                                                        
                                                                      |
   
+--------------------------------------------------------------------------------------------------------------------------------------------------+
   |                                                                            
                                                                      |
   |                                                                            
                                                                      |
   |                                                                            
                                                                      |
   |                                                                            
                                                                      |
   | 
687474703a2f2f686f6c6f64696c6e696b2e72752f7275737369612f30356a756c32303133266d6f64656c3d30
                                                       |
   | 
687474703a2f2f6166697368612e6d61696c2e72752f636174616c6f672f3331342f776f6d656e2e72752f656e63793d312670616765332f3f6572726f7661742d70696e6e696b69
 |
   | 
687474703a2f2f626f6e707269782e72752f696e6465782e72752f63696e656d612f6172742f3020393836203432342032333320d181d0b5d0b7d0bed0bd
                     |
   | 
687474703a2f2f626f6e707269782e72752f696e6465782e72752f63696e656d612f6172742f4130303338372c33373937293b2072752926624c
                             |
   | 
687474703a2f2f746f7572732f456b617465676f726979612532462673723d687474703a2f2f736c6f766172656e697965
                                               |
   |                                                                            
                                                                      |
   
+--------------------------------------------------------------------------------------------------------------------------------------------------+
   10 rows in set. Query took 0.006 seconds.
   
   # This one doesn't work
   ❯ SELECT "URL" FROM hits GROUP BY "URL" LIMIT 10;
   ArrowError(ExternalError(Execution("Internal error: Unsupported data type in 
hasher: Binary. This was likely caused by a bug in DataFusion's code and we 
would welcome that you file an bug report in our issue tracker")))
   
   10 rows in set. Query took 0.006 seconds.
   ```
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to