[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #3360: Support `RowFilter` in `ParquetExec`

GitBox Mon, 05 Sep 2022 05:46:13 -0700


Ted-Jiang commented on issue #3360:
URL: 
https://github.com/apache/arrow-datafusion/issues/3360#issuecomment-1236968574


   @thinkharderdev  Wow! So looking forward！  💪
   
   > * I think it must be possible to control what predicates get pushed down 
the scan, an expensive predicate may still make sense as a row group filter but 
not a row filter
   > * We could restrict the pushed down predicates to simple binary predicates 
on dictionary or primitive columns by default
   > * We should make visible in the explain plan what is being pushed down to 
what level
   > * We could use the sort order if any to inform the push down order
   > * We need benchmarks, lots of benchmarks 😆
   
   Nice write up! Thanks👍
   
   I think one thing we should talk about , how to define the `non-selective 
predicates (expensive predicate)`.
   I think for now if we want to check wether is a  predicate selective on 
no-sorted col , we need know the the result page number, so we need read 
`col-index`. if we filter zero page, it will run slower than before.🤔 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #3360: Support `RowFilter` in `ParquetExec`

Reply via email to