andygrove edited a comment on issue #863: URL: https://github.com/apache/arrow-datafusion/issues/863#issuecomment-897273083
If I comment out the code in our parquet reader that filters out row groups based on predicates then I see the expected results. ``` > SELECT COUNT(*) FROM customer WHERE c_mktsegment = 'BUILDING'; +-----------------+ | COUNT(UInt8(1)) | +-----------------+ | 29998146 | +-----------------+ 1 row in set. Query took 0.874 seconds. ``` My conclusion is that we have a bug in our DataFusion/Parquet writer where we are writing incorrect statistics somehow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org