alamb opened a new issue, #8295: URL: https://github.com/apache/arrow-datafusion/issues/8295
### Describe the bug While working on https://github.com/apache/arrow-datafusion/issues/8229 I found another bug that is non obvious, but that can be clearly seen now thanks to https://github.com/apache/arrow-datafusion/issues/8110 and https://github.com/apache/arrow-datafusion/issues/8111 from @NGA-TRAN ### To Reproduce ```sql ❯ copy (values ('foo'), ('bar'), ('baz')) to '/tmp/strings.parquet'; +-------+ | count | +-------+ | 3 | +-------+ 1 row in set. Query took 0.023 seconds. ``` And then look at the explain verbose up can see there are no min/max statisics shown: ```sql ❯ explain verbose select * from '/tmp/strings.parquet'; | | | | physical_plan_with_stats | ParquetExec: file_groups={1 group: [[private/tmp/strings.parquet]]}, projection=[column1], statistics=[Rows=Exact(3), Bytes=Absent, [(Col[0]: Null=Exact(0))]] | | | | +------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 80 rows in set. Query took 0.002 seconds. ``` ### Expected behavior I expect there to be min/max values extracted in the statistics for the strings, as there are for integers (`(Col[0]: Min=Exact(Int64(1)) Max=Exact(Int64(3))`) ```shell ❯ copy (values (1), (2), (3)) to '/tmp/ints.parquet'; +-------+ | count | +-------+ | 3 | +-------+ 1 row in set. Query took 0.023 seconds. ``` ```sql ❯ explain verbose select * from '/tmp/ints.parquet'; ... | | physical_plan | ParquetExec: file_groups={1 group: [[private/tmp/ints.parquet]]}, projection=[column1] | | | | | physical_plan_with_stats | ParquetExec: file_groups={1 group: [[private/tmp/ints.parquet]]}, projection=[column1], statistics=[Rows=Exact(3), Bytes=Absent, [(Col[0]: Min=Exact(Int64(1)) Max=Exact(Int64(3)) Null=Exact(0))]] | | | | +------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
