alamb opened a new issue, #8295:
URL: https://github.com/apache/arrow-datafusion/issues/8295

   ### Describe the bug
   
   While working on https://github.com/apache/arrow-datafusion/issues/8229 I 
found another bug that is non obvious, but that can be clearly seen now thanks 
to https://github.com/apache/arrow-datafusion/issues/8110 and 
https://github.com/apache/arrow-datafusion/issues/8111 from @NGA-TRAN 
   
   
   
   
   ### To Reproduce
   
   ```sql
   ❯ copy (values ('foo'), ('bar'), ('baz')) to '/tmp/strings.parquet';
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row in set. Query took 0.023 seconds.
   ```
   
   And then look at the explain verbose up can see there are no min/max 
statisics shown:
   
   ```sql
   ❯ explain verbose select * from '/tmp/strings.parquet';
   
   |                                                            |               
                                                                                
                                                                 |
   | physical_plan_with_stats                                   | ParquetExec: 
file_groups={1 group: [[private/tmp/strings.parquet]]}, projection=[column1], 
statistics=[Rows=Exact(3), Bytes=Absent, [(Col[0]: Null=Exact(0))]] |
   |                                                            |               
                                                                                
                                                                 |
   
+------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
   80 rows in set. Query took 0.002 seconds.
   ```
   
   ### Expected behavior
   
   I expect there to be min/max values extracted in the statistics for the 
strings, as there are for integers (`(Col[0]: Min=Exact(Int64(1)) 
Max=Exact(Int64(3))`)
   
   ```shell
   ❯ copy (values (1), (2), (3)) to '/tmp/ints.parquet';
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row in set. Query took 0.023 seconds.
   ```
   
   ```sql
   ❯ explain verbose select * from '/tmp/ints.parquet';
   ...
                                                                                
                                  |
   | physical_plan                                              | ParquetExec: 
file_groups={1 group: [[private/tmp/ints.parquet]]}, projection=[column1]       
                                                                                
                       |
   |                                                            |               
                                                                                
                                                                                
                      |
   | physical_plan_with_stats                                   | ParquetExec: 
file_groups={1 group: [[private/tmp/ints.parquet]]}, projection=[column1], 
statistics=[Rows=Exact(3), Bytes=Absent, [(Col[0]: Min=Exact(Int64(1)) 
Max=Exact(Int64(3)) Null=Exact(0))]] |
   |                                                            |               
                                                                                
                                                                                
                      |
   
+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to