alamb commented on issue #4028: URL: https://github.com/apache/datafusion/issues/4028#issuecomment-2271869472
## Background Reading Here is a background article about parquet predicate pushdown: https://www.influxdata.com/blog/querying-parquet-millisecond-latency/#heading5 Specifically the section on `Late materialization` describes what is done in the RowFilter code ## Code links for Listing Table Supports filter pushdown: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html#impl-TableProvider-for-ListingTable ## Conditions: Here are the conditions @tustvold lists above, and some links about what they mean > Parquet predicate pushdown is enabled This refers to the `datafusion.execution.parquet.pushdown_filters` configuration setting ([docs here) ](https://datafusion.apache.org/user-guide/configs.html) > The FileFormat is parquet This should be relatively straightforward to determine > The predicate is fully pushed down by ParquetExec (not all predicates are supported) "pushdown" refers to this code in ParquetExec (there is some background information in the parquet [RowFilter API](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowFilter.html)) https://github.com/apache/datafusion/blob/16a3557325eb8f949f4a87ab90c0a0b174dc8d86/datafusion/core/src/datasource/physical_plan/parquet/row_filter.rs#L43-L70 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
