tustvold commented on issue #4005: URL: https://github.com/apache/arrow-datafusion/issues/4005#issuecomment-1295949956
Ok so the bug is that `DatafusionArrowPredicate` is assuming that `ProjectionMask` is order preserving, and so is not remapping the columns before passing them to `PhysicalExpr`. In reality `ProjectionMask` is not order preserving, and so the batch passed to `ArrowPredicate::evaluate` has columns in the order of the file's schema. In the case of this query, this results in it evaluating the predicates against the wrong columns, resulting in no rows that pass the predicate. You can see this, by reordering the predicates in the query ``` select count(*) from foo where pod = 'aqcathnxqsphdhgjtgvxsfyiwbmhlmg' OR container = 'backend_container_0'; +-----------------+ | COUNT(UInt8(1)) | +-----------------+ | 39982 | +-----------------+ 1 row in set. Query took 0.107 seconds. ``` This is likely also the issue behind #4006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
