andygrove edited a comment on issue #863:
URL: 
https://github.com/apache/arrow-datafusion/issues/863#issuecomment-897273083


   If I comment out the code in our parquet reader that filters out row groups 
based on predicates then I see the expected results.
   
   ```
   > SELECT COUNT(*) FROM customer WHERE c_mktsegment = 'BUILDING';
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 29998146        |
   +-----------------+
   1 row in set. Query took 0.874 seconds.
   ```
   
   My conclusion is that we have a bug in our DataFusion/Parquet writer where 
we are writing incorrect statistics somehow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to