adriangb commented on issue #20325:
URL: https://github.com/apache/datafusion/issues/20325#issuecomment-3893737697

   > There are two major differences with the pushdown path:
   > 
   >     1. The IO pattern is different (first the data needed for filtering is 
fetched, and then only the pages for the rows passing are fetched for the 
projection). Without pushdown all pages for all columns (both filter and 
projection are fetched)
   > 
   >     2. The overhead of evaluating the filter,  then selectively decoding 
only the rows that match for the projection column. Without pushdown, all 
columns are decoded and then a single filter pass is applied afterwards
   
   I think a third one is that parallelism is different. `FilterExec` often 
sits on top of a `RepartitionExec`. Unless there are more files than cores 
(even if there are, some may be smaller/larger, etc.) or infra-file 
re-partitioning is turned on the parallelism is going to be lower with 
predicate pushdown than in a `FilterExec`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to